| Alignment partition | Run name | Rate multiplier | Sites |
|---|---|---|---|
| 3ad39b9161154796bf2aef1aa746b469.phy | fastest1nocalib | 4.677767 | 2697 |
| 5fad91d04084ac4043f35092e579c27f.phy | fastest2nocalib | 5.093961 | 2036 |
| 64a99fde30f869ec30f91c8d2f3fe6fe.phy | fastest3nocalib | 3.848822 | 2987 |
| 4ff6dce0b7cebc9428b4b77e79356484.phy | largest1nocalib | 0.267668 | 30803 |
| ccf55a6ee6d62f840a124bcc0c98ecf5.phy | largest2nocalib | 0.010000 | 132188 |
| fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy | largest3nocalib | 0.312974 | 12916 |
In order to treat the maximum ages specified in the calibration file as soft bounds, the prior must be set to the birth-death process according to the manual. However, if the birth-death prior is combined with a set of calibrations, its hyperparameters must be fixed to specific values rather than estimated from the data. The PhyloBayes manual recommends running a preliminary analysis without calibrations and with free hyperparameters. The resulting estimates can then be used to perform a calibrated analysis.
(Remember: the supplied tree may be unrooted or rooted, but must be in the NEWICK format. If the first line contains information about the number of tips rather than the topology, the spaces will cause segmentation faults.)
../pb -d 3ad39b9161154796bf2aef1aa746b469.phy -T 12_no_outgroups_scheme_3.tre -r acanth.outgroup -ln -bd fastest1nocalib
(In this case, the outgroup file is superfluous, since the tree is already rooted. Therefore, the following 5 runs do not make use of the -r option. This should not affect the results in any way.)
../pb -d 5fad91d04084ac4043f35092e579c27f.phy -T 12_no_outgroups_scheme_3.tre -ln -bd fastest2nocalib
../pb -d 64a99fde30f869ec30f91c8d2f3fe6fe.phy -T 12_no_outgroups_scheme_3.tre -ln -bd fastest3nocalib
../pb -d 4ff6dce0b7cebc9428b4b77e79356484.phy -T 12_no_outgroups_scheme_3.tre -ln -bd largest1nocalib
../pb -d ccf55a6ee6d62f840a124bcc0c98ecf5.phy -T 12_no_outgroups_scheme_3.tre -ln -bd largest2nocalib
../pb -d fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy -T 12_no_outgroups_scheme_3.tre -ln -bd largest3nocalib
Example of a Tracer-compatible trace file generated by PhyloBayes:
## X.cycle X.treegen time loglik length sigma mu meanrates
## 1 1 0 3 -132220 3.02257 6.17254 0.1128120 1.21207
## 2 2 0 6 -130045 3.53405 5.81433 0.0798298 1.98474
## 3 3 0 8 -129011 3.85674 6.96148 0.2372700 0.73773
## 4 4 0 11 -128331 4.08829 7.05736 0.1749990 1.09214
## 5 5 0 14 -127865 4.30031 6.06568 0.1890880 1.04218
## 6 6 0 17 -127540 4.52842 7.25878 0.1826010 1.09504
## p1 p2 alpha nmode stat statalpha kappa allocent
## 1 3.46031 1.498980 0.289504 81 1.16055 5.01021 14.5527 3.48840
## 2 6.54925 0.899936 0.253237 82 1.15225 5.87930 15.6545 3.53462
## 3 3.74662 1.888960 0.259104 82 1.14826 5.31522 15.0681 3.53537
## 4 3.53956 1.431250 0.267635 81 1.14046 5.84142 14.7984 3.56157
## 5 3.48121 2.049510 0.277655 81 1.13575 5.20172 16.4106 3.56845
## 6 4.57659 1.032550 0.293295 75 1.13833 5.18068 13.8315 3.54547
Progress check: Friday 11/18/2016, 1:05 PM (fastest1nocalib), 1:41 PM (fastest2nocalib), 1:49 PM (fastest3nocalib), 2:01 PM (largest1nocalib), 2:05 PM (largest2nocalib), 1:54 PM (largest3nocalib). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:
| parameter | fast1 | fast2 | fast3 | large1 | large2 | large3 |
|---|---|---|---|---|---|---|
| states | 101564 | 96979 | 71418 | 8375 | 2039 | 19871 |
| loglik | 1325 | 2310 | 780 | 45 | 3 | 30 |
| length | 2235 | 2744 | 1292 | 2505 | 469 | 2543 |
| sigma | 4396 | 3590 | 2882 | 11 | 16 | 15 |
| mu | 1149 | 1103 | 535 | 22 | 265 | 32 |
| meanrates | 1345 | 1470 | 512 | 79 | 107 | 239 |
| p1 | 1102 | 1032 | 573 | 39 | 314 | 56 |
| p2 | 285 | 229 | 243 | 46 | 22 | 1074 |
| alpha | 6225 | 5192 | 1693 | 6200 | 150 | 14430 |
| nmode | 342 | 389 | 371 | 30 | 42 | 84 |
| stat | 3206 | 3086 | 1337 | 49 | 3 | 31 |
| statalpha | 34753 | 4021 | 7103 | 4140 | 206 | 1286 |
| kappa | 583 | 556 | 543 | 101 | 129 | 198 |
| allocent | 170 | 251 | 171 | 3 | 3 | 46 |
Progress check: Thursday 11/24/2016, 00:13 (fastest1nocalib), 00:15 (fastest2nocalib), 00:17 (fastest3nocalib), 00:24 (largest1nocalib), 00:25 (largest2nocalib), 00:25 (largest3nocalib). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:
| parameter | fast1 | fast2 | fast3 | large1 | large2 | large3 |
|---|---|---|---|---|---|---|
| states | 220907 | 246584 | 182374 | 21771 | 5314 | 50975 |
| loglik | 3442 | 5831 | 2456 | 51 | 4 | 74 |
| length | 4701 | 6967 | 2208 | 6137 | 1264 | 6452 |
| sigma | 6426 | 9339 | 3262 | 3619 | 44 | 112 |
| mu | 2604 | 2827 | 1139 | 618 | 462 | 134 |
| meanrates | 2376 | 3012 | 1560 | 638 | 57 | 111 |
| p1 | 1573 | 1957 | 706 | 3219 | 860 | 818 |
| p2 | 394 | 355 | 296 | 6658 | 90 | 4557 |
| alpha | 13401 | 13734 | 1879 | 14922 | 382 | 37241 |
| nmode | 947 | 1183 | 964 | 59 | 4 | 222 |
| stat | 6172 | 6952 | 3126 | 56 | 4 | 76 |
| statalpha | 73688 | 8656 | 12776 | 9557 | 53 | 4973 |
| kappa | 1503 | 1689 | 1464 | 335 | 6 | 836 |
| allocent | 462 | 706 | 478 | 10 | 3 | 85 |
Progress check of the unconverged analyses: Friday 12/05/2016, 11:12 PM (largest1nocalib), 11:13 (largest2nocalib), 11:11 PM (largest3nocalib). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:
| parameter | largest1 | largest2 | largest3 |
|---|---|---|---|
| states | 43950 | 10602 | 102442 |
| loglik | 143 | 4 | 133 |
| length | 11768 | 2716 | 11510 |
| sigma | 7609 | 80 | 213 |
| mu | 1125 | 739 | 264 |
| meanrates | 1145 | 130 | 217 |
| p1 | 6205 | 1290 | 1603 |
| p2 | 12956 | 213 | 8595 |
| alpha | 30927 | 718 | 75924 |
| nmode | 112 | 6 | 469 |
| stat | 155 | 4 | 136 |
| statalpha | 19358 | 32 | 10492 |
| kappa | 286 | 9 | 1391 |
| allocent | 27 | 4 | 136 |
A detailed examination of the table above shows that the parameters for which the chain failed to converge to the posterior are almost exclusively those related to the site-heterogeneous mixture model. This is to be expected given the comparatively large number of sites in the three partitions. In contrast, the only two parameters of direct interest – p1 and p2 – have reached sufficient effective sample sizes in all the 3 analyses. This suggests it might be worthwhile to switch to another approach in order to speed up the analysis, and run the uncalibrated analyses of the 3 largest partitions under a simpler model with just one equlibrium frequency profile (such as GTR+\(\Gamma_4\)) instead of the default mixture of F81 models. Note that since the birth-death process of speciation and extinction is independent of the nucleotide substitution process, it does not matter that p1 and p2 were estimated under CAT-F81 for the 3 fastest partitions and under GTR+\(\Gamma\) for the 3 largest partitions. Therefore, the 3 chains listed above (largest1nocalib, largest2nocalib, largest3nocalib) were soft-stopped, and 3 new chains were started as follows:
../pb -d 4ff6dce0b7cebc9428b4b77e79356484.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 largest1nocal2
../pb -d ccf55a6ee6d62f840a124bcc0c98ecf5.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 largest2nocal2
../pb -d fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 largest3nocal2
Progress check of the unconverged analyses: Friday 12/17/2016, 01:46 PM (largest1nocal2), 01:48 (largest2nocal2), 01:48 PM (largest3nocal2). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:
| parameter | largest1 | largest2 | largest3 |
|---|---|---|---|
| states | 4420 | 1077 | 9970 |
| loglik | 158 | 944 | 1318 |
| length | 2972 | 965 | 2848 |
| sigma | 104 | 86 | 113 |
| mu | 242 | 869 | 483 |
| meanrates | 277 | 555 | 150 |
| p1 | 717 | 505 | 1497 |
| p2 | 740 | 150 | 5781 |
| alpha | 2982 | 64 | 7534 |
| nmode | - | - | - |
| stat | 2317 | 969 | 4357 |
| statalpha | - | - | - |
| rrent | 2303 | 797 | 4480 |
| meanrr | 326 | 525 | 505 |
| kappa | - | - | – |
| allocent | - | - | – |
Progress check of the unconverged analyses: Friday 12/23/2016, 03:01 AM (largest1nocal2, largest2nocal2, largest3nocal2). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:
| parameter | largest1 | largest2 | largest3 |
|---|---|---|---|
| states | 7167 | 1731 | 16073 |
| loglik | 3995 | 1506 | 2364 |
| length | 5144 | 1551 | 4617 |
| sigma | 5386 | 148 | 185 |
| mu | 476 | 1428 | 715 |
| meanrates | 1779 | 492 | 206 |
| p1 | 4364 | 840 | 2487 |
| p2 | 3998 | 212 | 8591 |
| alpha | 4695 | 118 | 12361 |
| nmode | - | - | - |
| stat | 3967 | 1557 | 6673 |
| statalpha | - | - | - |
| rrent | 4575 | 951 | 7184 |
| meanrr | 539 | 787 | 915 |
| kappa | - | - | – |
| allocent | - | - | – |
The analyses that reached convergence in all of their parameters were then “soft-stopped” using the following commands:
../stoppb fastest1nocalib
../stoppb fastest2nocalib
../stoppb fastest3nocalib
../stoppb largest1nocal2
../stoppb largest3nocal2
The chains stopped at 250673, 282838, 209815, 7176, and 17588 generations (“points”), respectively. In order to obtain reliable estimates of the two parameters of interest (\(p_1 = \lambda - \mu\) = birth rate minus death rate; \(p_2 = \lambda\omega\) = birth rate times sampling rate), the chain was subjected to two different analyses:
Using the default readdiv program from the PhyloBayes package. For the fastest partitions, the first 10,000 trees were discarded as burnin and only every 500th tree was included to minimize autocorrelation. For the largest1 chain, which was two orders of magnitude shorter, the first 1000 trees were discarded and the subsampling frequency was set equal to 30. The largest3 chain was assigned a burnin of 2000 generations, with the ../readdiv analysis sampling every 50th tree:
../readdiv -x 10000 500 fastest1nocalib
../readdiv -x 10000 500 fastest2nocalib
../readdiv -x 10000 500 fastest3nocalib
../readdiv -x 1000 30 largest1nocal2
../readdiv -x 2000 50 largest3nocal2Using Tracer v1.6, in which the first tenth of the collected samples is automatically discarded as burnin and posterior means as well as effective sample sizes (ESS) are automatically computed from the rest. Both the means and the effective sizes are reported below.
| chain | points (readdiv) | p1 (readdiv) | p2 (readdiv) | p1 ESS | p2 ESS | p1 (Tracer) | p2 (Tracer) |
|---|---|---|---|---|---|---|---|
| fastest1 | 481 | 0.0018922 +/- 0.00139346 | 0.00148774 +/- 0.00043172 | 1835 | 489 | 1.798 | 1.45 |
| fastest2 | 545 | 0.00141146 +/- 0.00120643 | 0.00153467 +/- 0.00047274 | 2327 | 420 | 1.405 | 1.553 |
| fastest3 | 399 | 0.00195453 +/- 0.00146607 | 0.00148684 +/- 0.000455553 | 886 | 363 | 1.987 | 1.474 |
| largest1 | 205 | 0.0172977 +/- 0.00212044 | 1.48854e-10 +/- 4.22314e-10 | 4380 | 4002 | 17.491 | 1.139e-7 |
| largest3 | 311 | 0.0162411 +/- 0.00201529 | 2.93377e-10 +/- 7.18424e-10 | 3043 | 8995 | 16.387 | 3.149e-7 |
As can be seen from the above, the estimates from readdiv and Tracer are highly similar to each other (except for scaling). To ensure compatibility, the readdiv scaling will be used for the calibrated analyses.
According to the manual, the prior on the root age is automatically determined from the birth-death model when the latter is used as the tree prior. However, the resulting distribution is reported to be “proper, but apparently unstable”. Therefore, it is recommended to specify the -rp option explicitly, and run an analysis without data (i.e., sampling from the prior) but with calibrations. The results are then used to verify that (1) the distributions for the calibrated nodes correspond to user expectations and (2) the root age prior is sufficiently non-informative.
The root age prior is a gamma distribution. The manual recommends setting its mean and standard deviation to the same value in order to obtain an exponential distribution for which that value is the mean. Here, the mean is set to 130 Ma, which corresponds well to the “soft” (95%) upper bound on the oldest calibration in the tree (Aipichthys: 128.82 Ma). The results of the prior-sampling analysis should therefore approximate the following distribution:
## in R, gamma is specified by shape and rate:
## mean = 130 = shape/rate
## stdev = 130 = sqrt(shape)/rate
## --> shape = 1, rate = 1/130
curve(dgamma(x, 1, 1/130), from=0, to=500,
main = "Suggested prior", xlab = "Age (in Ma)", ylab = "Probability density")
The following calibration file is used for the prior run under the -sb (soft bound) option:
## C1 C2 C3 C4
## 1 12 NA NA
## 2 lampris_guttatus aphredoderus_sayanus 128.82 98.00
## 3 polymixia_lowei aphredoderus_sayanus 128.04 93.90
## 4 percopsis_omiscomycus aphredoderus_sayanus 93.51 63.10
## 5 sargocentron_coruscum2 myripristis_leiognathus 109.29 49.00
## 6 scomber_scombrus scomberomorus_maculatus 95.58 54.17
## 7 pomacanthus_paru chaetodon_ocellatus 59.26 29.62
## 8 epibulus_insidiator scarus_ferrigineus 43.95 11.90
## 9 ogcocephalus_radiatus cryptopsaras_couesii 53.93 49.00
## 10 xenentodon_cancila pterophyllum_leopoldi 80.52 49.00
## 11 parachirus_xenicus poecilopsetta_plinthus 53.88 41.20
## 12 istiophorus_platypterus mene_maculatus 95.64 55.20
## 13 seriola_zonata alepes_kleinii 61.61 49.00
The analyses were started with the commands below. Since the runs only differed in the dataset used, which exercised no influence on the results (as the analyses were only allowed to sample from the prior), the reason for performing 3 separate runs was to ensure that the root age prior was sufficiently stable.
../pb -d 3ad39b9161154796bf2aef1aa746b469.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0018922 0.00148774 -cal calib -sb -rp 130 130 -prior fastest1prior
../pb -d 5fad91d04084ac4043f35092e579c27f.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00141146 0.00153467 -cal calib -sb -rp 130 130 -prior fastest2prior
../pb -d 64a99fde30f869ec30f91c8d2f3fe6fe.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00195453 0.00148684 -cal calib -sb -rp 130 130 -prior fastest3prior
The chains were soft-stopped at 116276 (fastest1prior), 116063 (fastest2prior), and 109005 (fastest3prior) generations and summarized using ../readdiv. Since the analysis only sampled from the prior, no burnin was specified in the command, and each sample was used for the computation. The command prints basic information (mean, st. dev.) about the shape of the root age distribution to the screen and generates a file with the _sample.dates suffix, containing more detailed information.
| chain | mean root age | root age st. dev. | lower 95% HPD bound | upper 95% HPD bound |
|---|---|---|---|---|
| fastest1prior | 222.134 | 97.1753 | 116.01 | 480.845 |
| fastest2prior | 217.357 | 92.908 | 114.938 | 456.943 |
| fastest3prior | 238.464 | 124.758 | 116.534 | 605.302 |
The prior assigns too much probability to excessively old ages. Potentially, this could be mitigated by specifying an exponential such that 95% (or even 99%) of the total probability mass falls below a certain upper bound (e.g., 143 Ma). However, because the root age prior cannot be offset, such an exponential would assign most of the total probability mass to the “impossible” region below 98 Ma (blue in the figure below) rather than to the desired interval between 98 Ma and 143 Ma (red):
## [1] "Blue (impossible) region probability = 0.957403822374395"
## [1] "Red (desired) region probability = 0.0325961776256052"
Therefore, it is preferable to explicitly specify the root age prior as another calibration in the calibration file, since calibrations (loaded with the -cal command) – unlike the root age prior (specified with the -rp command) – can be offset by a hard minimum (here, 98 Ma):
## C1 C2 C3 C4
## 1 13 NA NA
## 2 lampris_guttatus chaetodon_ocellatus 143.00 98.00
## 3 lampris_guttatus aphredoderus_sayanus 128.82 98.00
## 4 polymixia_lowei aphredoderus_sayanus 128.04 93.90
## 5 percopsis_omiscomycus aphredoderus_sayanus 93.51 63.10
## 6 sargocentron_coruscum2 myripristis_leiognathus 109.29 49.00
## 7 scomber_scombrus scomberomorus_maculatus 95.58 54.17
## 8 pomacanthus_paru chaetodon_ocellatus 59.26 29.62
## 9 epibulus_insidiator scarus_ferrigineus 43.95 11.90
## 10 ogcocephalus_radiatus cryptopsaras_couesii 53.93 49.00
## 11 xenentodon_cancila pterophyllum_leopoldi 80.52 49.00
## 12 parachirus_xenicus poecilopsetta_plinthus 53.88 41.20
## 13 istiophorus_platypterus mene_maculatus 95.64 55.20
## 14 seriola_zonata alepes_kleinii 61.61 49.00
The calibrated PhyloBayes analyses were run under the lognormal relaxed clock model, birth-death tree prior (whose hyperparameters were fixed to the values found in the uncalibrated analyses), and the GTR model with gamma-distributed across-site rate variation (discretized into 4 rate categories):
../pb -d 3ad39b9161154796bf2aef1aa746b469.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0018922 0.00148774 -cal newcalib -sb -gtr -dgam 4 fastest1cali
../pb -d 5fad91d04084ac4043f35092e579c27f.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00141146 0.00153467 -cal newcalib -sb -gtr -dgam 4 fastest2cali
../pb -d 64a99fde30f869ec30f91c8d2f3fe6fe.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00195453 0.00148684 -cal newcalib -sb -gtr -dgam 4 fastest3cali
../pb -d 4ff6dce0b7cebc9428b4b77e79356484.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0172977 0.000000000148854 -cal newcalib -sb -gtr -dgam 4 largest1cali
../pb -d fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0162411 0.000000000293377 -cal newcalib -sb -gtr -dgam 4 largest3cali
Note that:
-gtr option is equivalent to -gtr -ncat 1 – i.e., a single equilibrium frequency profile is applied to the entire dataset.The program assigned the following node numbers to the 13 calibrated nodes, which will be used to refer to their ages in the _sample.dates file:
| Node | Number |
|---|---|
| lampris_guttatus + chaetodon_ocellatus | 118 |
| lampris_guttatus + aphredoderus_sayanus | 119 |
| polymixia_lowei + aphredoderus_sayanus | 123 |
| percopsis_omiscomycus + aphredoderus_sayanus | 124 |
| sargocentron_coruscum2 + myripristis_leiognathus | 132 |
| scomber_scombrus + scomberomorus_maculatus | 162 |
| pomacanthus_paru + chaetodon_ocellatus | 220 |
| epibulus_insidiator + scarus_ferrigineus | 210 |
| ogcocephalus_radiatus + cryptopsaras_couesii | 229 |
| xenentodon_cancila + pterophyllum_leopoldi | 173 |
| parachirus_xenicus + poecilopsetta_plinthus | 189 |
| istiophorus_platypterus + mene_maculatus | 192 |
| seriola_zonata + alepes_kleinii | 195 |
Progress check: Friday 12/02/2016, 4:08 PM (fastest1cali), 4:11 PM (fastest2cali), 4:14 PM (fastest3cali). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:
| parameter | fastest1 | fastest2 | fastest3 |
|---|---|---|---|
| states | 14700 | 16599 | 14004 |
| loglik | 4412 | 5820 | 5523 |
| length | 3413 | 3964 | 3467 |
| sigma | 5210 | 4396 | 4394 |
| mu | 1228 | 1210 | 1016 |
| meanrates | 896 | 987 | 702 |
| p1 | - | - | - |
| p2 | - | - | - |
| scale | 1517 | 1335 | 1549 |
| alpha | 5122 | 4856 | 4390 |
| nmode | - | - | - |
| stat | 3146 | 3478 | 2759 |
| statalpha | - | - | - |
| rrent | 5173 | 5382 | 5896 |
| meanrr | 1147 | 1478 | 1089 |
| kappa | - | - | - |
| allocent | - | - | - |
Progress check: Sunday 01/01/2017, 09:52 AM (largest3cali),
| parameter | largest1 | largest3 |
|---|---|---|
| states | 3882 | 6200 |
| loglik | 29 | 662 |
| length | 2963 | 1828 |
| sigma | 18 | 364 |
| mu | 154 | 446 |
| meanrates | 80 | 452 |
| p1 | - | - |
| p2 | - | - |
| scale | 10 | 238 |
| alpha | 2933 | 4774 |
| nmode | - | - |
| stat | 2056 | 2659 |
| statalpha | - | - |
| rrent | 2552 | 2878 |
| meanrr | 199 | 370 |
| kappa | - | - |
| allocent | - | - |
The missing ESS values for a number of parameters are likely due to the fact that the parameters in questions were not estimated: for example, the birth-death model hyperparameters p1 and p2 were fixed to specific values, while the GTR+\(\Gamma\) model used for the analysis did not include kappa (transition/transversion ratio) or nmode (the mode of the number of distinct equilibrium frequency profiles). The meaning of the remaining two parameters is less clear, but allocent probably refers to the allocation of individual sites to different components of the profile mixture. It is notable that statalpha has an unchanging value of 20.
The first three analyses to reach convergence were soft-stopped using ../stoppb one day later (Saturday 12/03/2016, fastest1cali: 6:45 PM, fastest2cali: 6:50 PM, fastest3cali: 6:51 PM), to make sure that none of the missing ESS values were in fact caused by insufficient sampling for the parameter in question. (This was confirmed by inspecting the respective .trace files.) The largest3 chain was paused immediately after examining its trace.
The chains stopped at 17413, 19628, 16548, and 6215 generations (“points”), respectively. The resulting time trees were then examined in two ways: by running the ../readdiv program and by visualizing the resulting _sample.labels file in FigTree v1.4.2. For the three “fastest” chains, the subsampling frequency was conservatively estimated at 50 – in fact, according to the investigation of the final trace files in Tracer, none of the estimated parameters showed autocorrelation times greater than 15 (fastest1cali, fastest2cali) or 20 (fastest3cali). The burnin was specified to comprise the first tenth of each run, as is the default setting in Tracer. For the largest3 chain, the same general approach was followed, but the subsampling frequency was decreased to 25 to keep the final number of samples above 200. Once again, this frequency was only chosen after making sure that it did not exceed the longest autocorrelation time observed in Tracer. None of the chains showed a clearly defined burnin stage when examined in Tracer; instead of an initial rapid climb, the traces immediately assumed the typical “white noise” pattern indicative of reaching stationarity.
../readdiv -x 174 50 fastest1cali
../readdiv -x 196 50 fastest2cali
../readdiv -x 165 50 fastest3cali
../readdiv -x 622 25 largest3cali
fastest1cali PhyloBayes time tree
fastest2cali PhyloBayes time tree
fastest3cali PhyloBayes time tree
largest3cali PhyloBayes time tree
As mentioned above, it is necessary to check if any of the calibration nodes are younger than allowed by their lower bounds. In the table below, node ages above the upper bound are in italics and ages below the lower bound are in bold.
| Calibration | Lower bound | Upper bound | Node number | Fastest 1 | Fastest 2 | Fastest 3 | Largest 3 |
|---|---|---|---|---|---|---|---|
| Aipichthys | 98.0 | 128.82 | 119 | 128.253 | 128.0939 | 127.2215 | 96.8281 |
| Berybolcensis | 49.0 | 109.29 | 132 | 78.4578 | 80.785 | 81.5117 | 95.212 |
| Calatomus | 11.9 | 43.95 | 210 | 26.9649 | 23.3516 | 26.1327 | 42.5489 |
| Chaetodontidae | 29.62 | 59.26 | 220 | 59.467 | 60.4535 | 59.8 | 58.98 |
| Eobuglossus | 41.2 | 53.88 | 189 | 43.3564 | 42.4537 | 46.1384 | 53.2311 |
| Eucoelopoma | 54.17 | 95.58 | 162 | 60.1789 | 67.7836 | 61.7867 | 91.9956 |
| Homonotichthys | 93.9 | 128.04 | 123 | 118.2912 | 120.7762 | 120.7594 | 94.8645 |
| Mcconichthys | 63.1 | 93.51 | 124 | 49.1689 | 56.5919 | 46.1071 | 92.4408 |
| Ramphexocoetus | 49.0 | 80.52 | 173 | 77.4699 | 75.2663 | 75.334 | 57.8225 |
| Tarkus | 49.0 | 53.93 | 229 | 49.6837 | 50.0037 | 49.5935 | 52.5946 |
| Mene | 55.2 | 95.64 | 192 | 72.4889 | 71.6771 | 71.2218 | 56.8813 |
| Eastmanelepes | 49.0 | 61.61 | 195 | 52.6297 | 51.0873 | 52.7224 | 57.2809 |
Note that the complete information on overflows and underflows (reporting percentages of the 95% HPDs that extend beyond the bounds rather than just the means) is stored in the _sample.calib file generated by readdiv.
A more comprehensive dataset containing 29 calibration points was used to construct a new calibration file. Of these, only 28 points were included in the final file, since the deepest calibration was assigned to the root of the entire tree (including outgroups, which were removed from the phylobayes alignment files). On the other hand, the same root calibration that was used in the previous phase of the analysis was also added to the new calibration file in order to restrict the default improper uniform prior on the root age to a paleontologically realistic time range:
## C1 C2 C3 C4
## 1 29 NA NA
## 2 lampris_guttatus scarus_ferrigineus 143.0 98.000
## 3 anoplogaster_cornuta scarus_ferrigineus 128.0 93.900
## 4 epibulus_insidiator scarus_ferrigineus 63.3 11.900
## 5 balistes_capriscus canthigaster_figueiredoi 69.6 50.500
## 6 takifugu_occelatus canthigaster_figueiredoi 58.5 32.020
## 7 ogcocephalus_radiatus cryptopsaras_couesii 61.6 49.000
## 8 pomacanthus_paru acanthurus_olivaceus 80.8 54.170
## 9 naso_unicornis acanthurus_olivaceus 69.3 49.000
## 10 pomacanthus_paru chaetodon_kleinii 66.0 29.620
## 11 coryphaena_hippurus alepes_kleinii 80.8 54.170
## 12 seriola_zonata alepes_kleinii 69.3 49.000
## 13 istiophorus_platypterus mene_maculatus 94.6 55.200
## 14 citharoides_macrolepis poecilopsetta_plinthus 79.7 49.000
## 15 parachirus_xenicus poecilopsetta_plinthus 66.6 41.200
## 16 centropomus_medius sphyraena_putnamae 79.7 49.000
## 17 xenentodon_cancila pterophyllum_leopoldi 79.7 49.000
## 18 pholidichthys_leucotaenia pterophyllum_leopoldi 67.4 45.460
## 19 syngnathus_fuscus pseudupeneus_maculatus 111.3 69.710
## 20 syngnathus_fuscus aulostomus_sp 93.7 49.000
## 21 scomber_scombrus scomberomorus_maculatus 94.2 54.170
## 22 sargocentron_coruscum2 myripristis_leiognathus 93.7 49.000
## 23 lampris_guttatus gadus_morhua 143.0 98.000
## 24 zeus_faber gadus_morhua 111.3 69.710
## 25 coryphaenoides_rupestris gadus_morhua 94.4 53.700
## 26 zeus_faber zenopsis_conchifera 91.4 32.020
## 27 polymixia_lowei aphredoderus_sayanus 128.0 93.900
## 28 percopsis_omiscomycus aphredoderus_sayanus 109.6 58.551
## 29 lampris_guttatus regalecus_glesne 122.5 54.170
## 30 zu_elongatus regalecus_glesne 94.0 5.330
Note that the ages for the tree as a whole and for the lampridiform-polymixiiform-percopsiform-zeiform-gadiform assemblage are drawn from the same prior.
The new round of analyses was performed with exactly the same settings, except for the use of a different set of calibrations:
../pb -d 3ad39b9161154796bf2aef1aa746b469.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0018922 0.00148774 -cal extendcalib -sb -gtr -dgam 4 fastest1ext
../pb -d 5fad91d04084ac4043f35092e579c27f.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00141146 0.00153467 -cal extendcalib -sb -gtr -dgam 4 fastest2ext
../pb -d 64a99fde30f869ec30f91c8d2f3fe6fe.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00195453 0.00148684 -cal extendcalib -sb -gtr -dgam 4 fastest3ext
../pb -d fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0162411 0.000000000293377 -cal extendcalib -sb -gtr -dgam 4 largest3ext
Progress check: Friday 12/14/2016, 9:38 AM (fastest1ext), 9:38 AM (fastest2ext), 9:39 AM (fastest3ext). The numbers listed in all cells except for the “states” row refer to the effective sample sizes (ESS) calculated using Tracer 1.6:
| parameter | fastest1 | fastest2 | fastest3 |
|---|---|---|---|
| states | 17160 | 19164 | 16524 |
| loglik | 5111 | 6159 | 6045 |
| length | 3276 | 4328 | 4064 |
| sigma | 5133 | 7195 | 4589 |
| mu | 1162 | 1446 | 1243 |
| meanrates | 803 | 939 | 822 |
| p1 | - | - | - |
| p2 | - | - | - |
| scale | 458 | 1112 | 319 |
| alpha | 5733 | 6429 | 5225 |
| nmode | - | - | - |
| stat | 3464 | 4018 | 3597 |
| statalpha | - | - | - |
| rrent | 6395 | 7131 | 7131 |
| meanrr | 1596 | 1605 | 1675 |
Based on these results, the three chains were almost immediately soft-stopped (fastest1ext: 10:18 AM, fastest2ext: 10:21 AM, fastest3ext: 10:22 AM) at 17217, 19235, and 16583 generations (“points”), respectively. The “largest3ext” chain was soft-stopped on Sunday 1/15/2017 at 27463 generations. The resulting trees were then examined as described above, with the burnin fraction again set equal to one tenth of the total length of the chain and the subsampling rate estimated at 50 samples. For comparison, the longest autocorrelation times estimated by Tracer were equal to 33.74 for fastest1ext, 18.38 for fastest2ext, and 46.63 for fastest3ext. An exception was made for the “largest3ext” chain, where an examination in Tracer suggested the presence of autocorrelation times of up to 120.58, and where the subsampling frequency was therefore raised to 1 sample per 125 generation, while the burnin proportion was lowered to 2463 generations (~9% of the total chain length) in order to keep the number of samples at a minimum of 200. All traces showed the same features as those of the first set of chains, i.e., no clearly defined burnin and an overall “white noise” pattern.
../readdiv -x 1722 50 fastest1ext
../readdiv -x 1924 50 fastest2ext
../readdiv -x 1658 50 fastest3ext
../readdiv -x 2463 125 largest3ext
fastest1ext PhyloBayes time tree
fastest2ext PhyloBayes time tree
fastest3ext PhyloBayes time tree
largest3ext PhyloBayes time tree
Calibration overflows (italics) and underflows (bold):
| Calibration | Lower bound | Upper bound | Node # | Fastest1 | Fastest2 | Fastest3 | Largest3 |
|---|---|---|---|---|---|---|---|
| Root | 98.0 | 143.0 | 118 | 139.862 | 139.168 | 139.838 | 99.2247 |
| Hoplopteryx | 93.9 | 128.0 | 129 | 128.028 | 127.967 | 127.744 | 96.5496 |
| Calatomus | 11.9 | 63.3 | 210 | 31.1886 | 27.3302 | 31.8635 | 62.8299 |
| Triodon | 50.5 | 69.6 | 230 | 57.9151 | 55.1725 | 57.4373 | 61.6498 |
| Archaeotetraodon | 32.02 | 58.5 | 231 | 21.3779 | 20.7734 | 23.018 | 60.5811 |
| Tarkus | 49.0 | 61.6 | 229 | 49.7678 | 51.1516 | 49.691 | 59.8781 |
| Luvarus | 54.17 | 80.8 | 219 | 75.079 | 72.9126 | 74.6186 | 65.222 |
| Proacanthurus | 49.0 | 69.3 | 222 | 52.0107 | 52.6224 | 52.8573 | 63.9612 |
| Chaetodontidae | 29.62 | 66.0 | 220 | 71.7011 | 68.176 | 70.189 | 63.5144 |
| Archaeus | 54.17 | 80.8 | 193 | 56.3517 | 56.7469 | 57.0398 | 64.9513 |
| Eastmanalepes | 49.0 | 69.3 | 195 | 52.3301 | 51.6484 | 52.4744 | 63.9605 |
| Mene | 55.2 | 94.6 | 192 | 71.74 | 71.7088 | 69.5679 | 62.9523 |
| Eobothus | 49.0 | 79.7 | 186 | 61.4272 | 59.273 | 65.103 | 64.218 |
| Eobuglossus | 41.2 | 66.6 | 189 | 43.6976 | 42.3343 | 44.8876 | 62.4628 |
| Sphyraena | 49.0 | 79.7 | 183 | 73.4988 | 75.6311 | 76.2236 | 65.1913 |
| Rhamphexocoetus | 49.0 | 79.7 | 173 | 75.3851 | 72.8168 | 70.1643 | 64.3795 |
| Mahengechromis | 45.46 | 67.4 | 174 | 65.1868 | 63.6862 | 64.5467 | 62.2615 |
| Gasterorhamphosus | 69.71 | 111.3 | 154 | 80.6308 | 81.4773 | 79.0943 | 66.9342 |
| Prosolenostomus | 49.0 | 93.7 | 155 | 74.4455 | 77.8095 | 76.7663 | 65.1738 |
| Eocoelopoma | 54.17 | 94.2 | 162 | 59.956 | 68.1011 | 60.4741 | 64.2698 |
| Berybolcensis | 49.0 | 93.7 | 132 | 66.4465 | 71.7619 | 69.9133 | 77.9071 |
| Aipichthys | 98.0 | 143.0 | 119 | 134.434 | 132.915 | 131.604 | 98.0569 |
| Cretzeus | 69.71 | 111.3 | 125 | 108.019 | 107.643 | 107.209 | 96.578 |
| Rhinocephalus | 53.7 | 94.4 | 128 | 54.7493 | 54.2768 | 54.0105 | 77.8132 |
| Zenopsis | 32.02 | 91.4 | 126 | 45.4801 | 42.5645 | 43.8688 | 62.3619 |
| Homonotichthys | 93.9 | 128.0 | 123 | 122.279 | 123.691 | 123.789 | 96.6863 |
| Massamorichthys | 58.551 | 109.6 | 124 | 41.5663 | 50.3007 | 40.3652 | 94.9388 |
| Turkmene | 54.17 | 122.5 | 120 | 74.1299 | 70.6346 | 65.7669 | 96.0469 |
| Trachipterus | 5.33 | 94.0 | 121 | 22.3674 | 22.095 | 30.3064 | 93.1854 |
Note that the complete information on overflows and underflows (reporting percentages of the 95% HPDs that extend beyond the bounds rather than just the means) is stored in the _sample.calib file generated by readdiv.
Due to the omission of an outgroup in the sequence used to calculate the upper bound on the Homonotichthys calibration (assigned to the (Polymixia + Aphredoderus) node), a new calibration file had to be created to correct for the error:
## C1 C2 C3 C4
## 1 13 NA NA
## 2 lampris_guttatus chaetodon_ocellatus 143.00 98.00
## 3 lampris_guttatus aphredoderus_sayanus 128.82 98.00
## 4 polymixia_lowei aphredoderus_sayanus 116.35 93.90
## 5 percopsis_omiscomycus aphredoderus_sayanus 93.51 63.10
## 6 sargocentron_coruscum2 myripristis_leiognathus 109.29 49.00
## 7 scomber_scombrus scomberomorus_maculatus 95.58 54.17
## 8 pomacanthus_paru chaetodon_ocellatus 59.26 29.62
## 9 epibulus_insidiator scarus_ferrigineus 43.95 11.90
## 10 ogcocephalus_radiatus cryptopsaras_couesii 53.93 49.00
## 11 xenentodon_cancila pterophyllum_leopoldi 80.52 49.00
## 12 parachirus_xenicus poecilopsetta_plinthus 53.88 41.20
## 13 istiophorus_platypterus mene_maculatus 95.64 55.20
## 14 seriola_zonata alepes_kleinii 61.61 49.00
This file was then used to re-run all of the 5 calibrated PhyloBayes analyses summarized above. As in Phase 4, the settings were left unchanged except for the calibration file replacement:
../pb -d 3ad39b9161154796bf2aef1aa746b469.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0018922 0.00148774 -cal calib_root_corrected -sb -gtr -dgam 4 fastest1corr
../pb -d 5fad91d04084ac4043f35092e579c27f.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00141146 0.00153467 -cal calib_root_corrected -sb -gtr -dgam 4 fastest2corr
../pb -d 64a99fde30f869ec30f91c8d2f3fe6fe.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00195453 0.00148684 -cal calib_root_corrected -sb -gtr -dgam 4 fastest3corr
../pb -d 4ff6dce0b7cebc9428b4b77e79356484.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0172977 0.000000000148854 -cal calib_root_corrected -sb -gtr -dgam 4 largest1corr
../pb -d fe7aa5d9de0f5f51ef6365ef3b7f1e06.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0162411 0.000000000293377 -cal calib_root_corrected -sb -gtr -dgam 4 largest3corr
The analyses were soft-stopped on 02/11/2017 at 9:37 PM (fastest1corr: 16899 generations), 9:43 PM (fastest2corr: 18923 generations), and 9:45 PM (fastest3corr: 16254 generations); and on 03/05/2017 at 9:46 PM (largest3corr: 56023 generations). The examination of the resulting trace files in Trace yielded the following effective sample sizes for the sampled parameters:
| parameter | fastest1corr | fastest2corr | fastest3corr | largest3corr |
|---|---|---|---|---|
| states | 16899 | 18923 | 16254 | 56023 |
| loglik | 4522 | 7054 | 6033 | 6184 |
| length | 3561 | 3499 | 4370 | 9741 |
| sigma | 7102 | 5885 | 4887 | 314 |
| mu | 1242 | 1573 | 1081 | 2571 |
| meanrates | 1680 | 1676 | 1091 | 973 |
| p1 | - | - | - | - |
| p2 | - | - | - | - |
| scale | 2246 | 3170 | 3215 | 206 |
| alpha | 5488 | 6051 | 5488 | 43110 |
| nmode | - | - | - | - |
| stat | 3779 | 4471 | 3511 | 23788 |
| statalpha | - | - | - | - |
| rrent | 6821 | 6647 | 6616 | 25497 |
| meanrr | 1070 | 1540 | 1396 | 3791 |
| kappa | - | - | - | - |
| allocent | - | - | - | - |
The longest autocorrelation times found using Tracer were equal to 14.22 for fastest1corr (meanrr), 11.06 (meanrr) for fastest2corr, 13:54 (mu) for fastest3corr, and 244.66 (scale) for largest3corr, respectively. Based on these values, the chains were summarized as follows:
../readdiv -x 1690 50 fastest1corr
../readdiv -x 1892 50 fastest2corr
../readdiv -x 1625 50 fastest3corr
../readdiv -x 5602 250 largest3corr
fastest1corr PhyloBayes time tree
fastest2corr PhyloBayes time tree
fastest3corr PhyloBayes time tree
largest3corr PhyloBayes time tree
Calibration overflows (italics) and underflows (bold):
| Calibration | Lower bound | Upper bound | Node # | fastest1 | fastest2 | fastest3 | largest3 |
|---|---|---|---|---|---|---|---|
| Aipichthys | 98.0 | 128.82 | 119 | 126.783 | 125.313 | 122.758 | 97.0571 |
| Berybolcensis | 49.0 | 109.29 | 132 | 78.2571 | 80.9249 | 82.0382 | 56.6446 |
| Calatomus | 11.9 | 43.95 | 210 | 26.6171 | 23.4019 | 25.7685 | 45.0734 |
| Chaetodontidae | 29.62 | 59.26 | 220 | 59.9596 | 60.5989 | 59.167 | 54.7753 |
| Eobuglossus | 41.2 | 53.88 | 189 | 43.5713 | 42.2855 | 46.3462 | 53.205 |
| Eucoelopoma | 54.17 | 95.58 | 162 | 59.3606 | 67.9218 | 62.5009 | 55.0984 |
| Homonotichthys | 93.9 | 116.35 | 123 | 114.716 | 115.321 | 114.571 | 96.0677 |
| Mcconichthys | 63.1 | 93.51 | 124 | 46.7736 | 54.0181 | 43.2484 | 93.7347 |
| Ramphexocoetus | 49.0 | 80.52 | 173 | 77.5816 | 75.2246 | 76.3504 | 54.7811 |
| Tarkus | 49.0 | 53.93 | 229 | 49.6754 | 50.0795 | 49.7156 | 53.1321 |
| Mene | 55.2 | 95.64 | 192 | 72.717 | 72.109 | 72.1927 | 54.8223 |
| Eastmanelepes | 49.0 | 61.61 | 195 | 52.6179 | 51.0957 | 52.9884 | 54.8232 |
The complete information abou overflows and underflows is stored in the corresponding _sample.calib files.
Since the analysis is to be run on a fixed topology, it was not necessary to enforce monophyly in the “Taxa” tab of BEAUTi.
No tip dates or traits have been specified in the XML file. The substitution model was set to GTR+\(\Gamma\) with estimated base frequencies and the \(\Gamma\) model of site heterogeneity discretized into 4 rate categories. No codon partitioning was used. A lognormal uncorrelated relaxed clock was selected as the clock model (without continuous quantile parameterization).
Calculating the calibration priors so that the given upper bound correspond to the 95th percentile of an exponential distribution offset by the lower bound:
\[ \text{mean} = \frac{1}{\lambda} = \frac{\text{upper bound} - \text{offset}}{\text{95th percentile}} \]
## [1] "Aipichthys = 10.2879687454302"
## [1] "Berybolcensis = 20.1252964199217"
## [1] "Calatomus = 10.6985528322855"
## [1] "Chaetodontidae = 9.8940750686097"
## [1] "Eobuglossus = 4.23268798481684"
## [1] "Eocoelopoma = 13.8229975907938"
## [1] "Homonotichthys = 11.3962119717387"
## [1] "Mcconichthys = 10.1511073831451"
## [1] "Ramphexocoetus = 10.5216344859169"
## [1] "Tarkus = 1.645674429428"
## [1] "Mene = 13.4992036361193"
## [1] "Eastmanelepes = 4.20932141076816"
In order to keep the tree topology constant, all the rearrangements operators (subtreeSlide, narrowEXchange, wideExchange, wilsonBalding, as well as subtreeLeap, which is off by default) were deactivated in BEAUTi. All the remaining operators were kept at their default values.
(Note: Unchecking the topology operators manually simply leads to BEAUTi reverting to the original setup. When that happens, the operators must be deleted directly from the XML file. However, it is is possible to prevent this by selecting “fixed tree topology” in the “Operator mix” drop-down menu.)
The length of the chain was set equal to 50 million generations, sampling every 1000 generations.
beast -threads 12 -beagle_SSE fastest1.xml
beast -threads 12 -beagle_SSE fastest2.xml
beast -threads 12 -beagle_SSE fastest3.xml
beast -threads 12 -beagle_SSE largest1.xml
beast -threads 12 -beagle_SSE largest2.xml
beast -threads 12 -beagle_SSE largest3.xml
Progress check: Friday 11/25/2016, 5:35 PM (fastest1), 5:44 PM (fastest2), 6:06 PM (fastest3), 6:07 PM (largest1), 6:09 PM (largest2), 6:10 PM (largest3)
(The order-of-magnitude discrepancies are caused by the fact that different analyses were started on different days.)
Progress check: Friday 12/02/2016, 5:36 PM (fastest1run2), 5:39 (fastest2run2), 5:42 (fastest3), 6:02 (largest1), 6:03 (largest2), 6:06 (largest3)
Progress check: Wednesday 12/14/2016, 4:48 PM (fastest1run2), 4:49 (fastest2run2), 4:50 (fastest3run2), 4:52 (largest1), 4:53 (largest2), 4:54 (largest3)
Progress check: Monday 12/19/2016, 11:16 AM (fastest1run2), 11:18 AM (fastest2run2), 11:36 AM (fastest3run2), 11:38 AM (largest1), 11:40 AM (largest2), 10:11 AM (largest3)
Progress check: Sunday 01/01/2017, 09:37 AM (fastest1run2), 09:38 AM (fastest3run2), 9:40 AM (largest1), 09:45 AM (largest2)
Progress check: Tuesday 01/10/2017, 8:30 PM (fastest1run2), 8:36 PM (fastest3run2), 8:38 PM (largest1), 8:40 PM (largest2)
Progress check: Wednesday 01/11/2017, 10:58 AM (fastest3run2), 10:56 AM (largest1), 11:05 AM (largest2)
Progress check: Monday 01/16/2017, 9:41 PM (fastest3run2), 9:44 PM (largest2)
Progress check: Tuesday 02/07/2017, 01:10 PM (largest2)
Operator analysis – fastest1
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.83 | 660747 | 12725277 | 19.26 | 0.2311 |
| scale(ag) | 0.883 | 661295 | 12735820 | 19.26 | 0.1791 |
| scale(at) | 0.841 | 659766 | 12709519 | 19.26 | 0.2183 |
| scale(cg) | 0.853 | 659551 | 12696596 | 19.25 | 0.2257 |
| scale(gt) | 0.823 | 659544 | 12715860 | 19.28 | 0.2141 |
| frequencies | 0.013 | 659030 | 12690144 | 19.26 | 0.239 |
| scale(alpha) | 0.703 | 658986 | 12667283 | 19.22 | 0.2383 |
| scale(ucld.mean) | 0.937 | 1979321 | 38020330 | 19.21 | 0.1951 |
| scale(ucld.stdev) | 0.949 | 1979589 | 30209713 | 15.26 | 0.1781 |
| scale(treeModel.rootHeight) | 0.887 | 1978926 | 2525621 | 1.28 | 0.2174 |
| uniform(nodeHeights(treeModel)) | 19799523 | 66403294 | 3.35 | 0.1446 | |
| scale(birthDeath.meanGrowthRate) | 0.553 | 1980489 | 313550 | 0.16 | 0.2424 |
| scale(birthDeath.relativeDeathRate) | 0.151 | 1981607 | 301348 | 0.15 | 0.2699 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.962 | 1979956 | 23097102 | 11.67 | 0.2333 |
| swapOperator(branchRates.categories) | 6600069 | 26808160 | 4.06 | 0.0911 | |
| uniformInteger(branchRates.categories) | 6601601 | 20645637 | 3.13 | 0.1637 |
Operator analysis – fastest2
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.82 | 660158 | 8672488 | 13.14 | 0.2296 |
| scale(ag) | 0.887 | 659057 | 8662889 | 13.14 | 0.2116 |
| scale(at) | 0.835 | 660445 | 8684486 | 13.15 | 0.226 |
| scale(cg) | 0.848 | 660260 | 8685804 | 13.16 | 0.2234 |
| scale(gt) | 0.821 | 659789 | 8666099 | 13.13 | 0.2262 |
| frequencies | 0.015 | 659293 | 8669795 | 13.15 | 0.2336 |
| scale(alpha) | 0.661 | 661119 | 8666069 | 13.11 | 0.2407 |
| scale(ucld.mean) | 0.934 | 1980229 | 25974357 | 13.12 | 0.2012 |
| scale(ucld.stdev) | 0.946 | 1979266 | 22205332 | 11.22 | 0.1939 |
| scale(treeModel.rootHeight) | 0.89 | 1978597 | 1821766 | 0.92 | 0.2478 |
| uniform(nodeHeights(treeModel)) | 19807202 | 46560273 | 2.35 | 0.1559 | |
| scale(birthDeath.meanGrowthRate) | 0.557 | 1982184 | 293504 | 0.15 | 0.2459 |
| scale(birthDeath.relativeDeathRate) | 0.147 | 1979410 | 281725 | 0.14 | 0.2628 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.962 | 1981765 | 15764484 | 7.95 | 0.2316 |
| swapOperator(branchRates.categories) | 6595972 | 18805478 | 2.85 | 0.1027 | |
| uniformInteger(branchRates.categories) | 6595254 | 14463753 | 2.19 | 0.1834 |
Operator analysis – fastest3
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.839 | 661439 | 26785582 | 40.5 | 0.2199 |
| scale(ag) | 0.89 | 659014 | 26717199 | 40.54 | 0.1878 |
| scale(at) | 0.836 | 662035 | 26830436 | 40.53 | 0.2104 |
| scale(cg) | 0.859 | 659911 | 26733550 | 40.51 | 0.2284 |
| scale(gt) | 0.833 | 659782 | 26725992 | 40.51 | 0.2259 |
| frequencies | 0.013 | 659708 | 26711803 | 40.49 | 0.239 |
| scale(alpha) | 0.673 | 660367 | 26708947 | 40.45 | 0.2404 |
| scale(ucld.mean) | 0.939 | 1979483 | 80118681 | 40.47 | 0.1863 |
| scale(ucld.stdev) | 0.953 | 1978891 | 63173044 | 31.92 | 0.1838 |
| scale(treeModel.rootHeight) | 0.882 | 1980427 | 5096614 | 2.57 | 0.2331 |
| uniform(nodeHeights(treeModel)) | 19799578 | 136515823 | 6.89 | 0.1485 | |
| scale(birthDeath.meanGrowthRate) | 0.548 | 1979678 | 529734 | 0.27 | 0.2361 |
| scale(birthDeath.relativeDeathRate) | 0.147 | 1980545 | 505737 | 0.26 | 0.261 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.962 | 1978362 | 48494033 | 24.51 | 0.2346 |
| swapOperator(branchRates.categories) | 6601937 | 55296150 | 8.38 | 0.0903 | |
| uniformInteger(branchRates.categories) | 6598843 | 42006022 | 6.37 | 0.1676 |
Operator analysis – largest1
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.867 | 658820 | 183774525 | 278.94 | 0.2188 |
| scale(ag) | 0.899 | 658719 | 183783261 | 279.0 | 0.1982 |
| scale(at) | 0.866 | 660382 | 184133731 | 278.83 | 0.2087 |
| scale(cg) | 0.877 | 661270 | 184529128 | 279.05 | 0.2193 |
| scale(gt) | 0.866 | 659165 | 183948566 | 279.06 | 0.2158 |
| frequencies | 0.01 | 659218 | 183698826 | 278.66 | 0.2376 |
| scale(alpha) | 0.674 | 661266 | 184561183 | 279.1 | 0.2436 |
| scale(ucld.mean) | 0.95 | 1980189 | 552340540 | 278.93 | 0.1808 |
| scale(ucld.stdev) | 0.984 | 1980148 | 291959479 | 147.44 | 0.0686 |
| scale(treeModel.rootHeight) | 0.969 | 1979815 | 26753339 | 13.51 | 0.1436 |
| uniform(nodeHeights(treeModel)) | 19797336 | 969520273 | 48.97 | 0.4089 | |
| scale(birthDeath.meanGrowthRate) | 0.553 | 1979627 | 421092 | 0.21 | 0.2403 |
| scale(birthDeath.relativeDeathRate) | 0.151 | 1983201 | 415925 | 0.21 | 0.2688 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.964 | 1979966 | 338319988 | 170.87 | 0.2447 |
| swapOperator(branchRates.categories) | 6598919 | 397530566 | 60.24 | 0.191 | |
| uniformInteger(branchRates.categories) | 6601959 | 303947951 | 46.04 | 0.279 |
Operator analysis – largest2
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.212 | 659982 | 506771328 | 767.86 | 0.0732 |
| scale(ag) | 0.223 | 660596 | 509593306 | 771.41 | 0.0752 |
| scale(at) | 0.216 | 659540 | 509270896 | 772.16 | 0.0718 |
| scale(cg) | 0.213 | 659038 | 573688140 | 870.49 | 0.0695 |
| scale(gt) | 0.21 | 660452 | 510072660 | 772.31 | 0.0695 |
| frequencies | 0.005 | 660389 | 511310855 | 774.26 | 0.0562 |
| scale(alpha) | 0.187 | 658735 | 187479801 | 284.61 | 0.1493 |
| scale(ucld.mean) | 0.303 | 1979260 | 563090279 | 284.5 | 0.1712 |
| scale(ucld.stdev) | 0.146 | 1981161 | 460653492 | 232.52 | 0.2578 |
| scale(treeModel.rootHeight) | 0.647 | 1979746 | 30696821 | 15.51 | 0.2509 |
| uniform(nodeHeights(treeModel)) | 19802437 | 879144583 | 44.4 | 0.7514 | |
| scale(birthDeath.meanGrowthRate) | 0.552 | 1980792 | 461126 | 0.23 | 0.2559 |
| scale(birthDeath.relativeDeathRate) | 0.165 | 1980368 | 355828 | 0.18 | 0.2458 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.961 | 1979370 | 344807687 | 174.2 | 0.2508 |
| swapOperator(branchRates.categories) | 6599973 | 356873046 | 54.07 | 0.9937 | |
| uniformInteger(branchRates.categories) | 6598161 | 274060739 | 41.54 | 0.9957 |
Operator analysis – largest3
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.819 | 659031 | 89206726 | 135.36 | 0.2316 |
| scale(ag) | 0.857 | 660868 | 89427641 | 135.32 | 0.2107 |
| scale(at) | 0.824 | 659432 | 89315593 | 135.44 | 0.2287 |
| scale(cg) | 0.821 | 659377 | 89299977 | 135.43 | 0.2338 |
| scale(gt) | 0.81 | 662027 | 89636024 | 135.4 | 0.2156 |
| frequencies | 0.015 | 660602 | 89473019 | 135.44 | 0.2348 |
| scale(alpha) | 0.617 | 659716 | 89281998 | 135.33 | 0.2415 |
| scale(ucld.mean) | 0.929 | 1980263 | 268234737 | 135.45 | 0.1957 |
| scale(ucld.stdev) | 0.978 | 1978061 | 146797457 | 74.21 | 0.1177 |
| scale(treeModel.rootHeight) | 0.953 | 1979592 | 13203639 | 6.67 | 0.1856 |
| uniform(nodeHeights(treeModel)) | 19806743 | 438928272 | 22.16 | 0.4735 | |
| scale(birthDeath.meanGrowthRate) | 0.555 | 1979445 | 497254 | 0.25 | 0.2418 |
| scale(birthDeath.relativeDeathRate) | 0.157 | 1976384 | 490952 | 0.25 | 0.2814 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.962 | 1980806 | 161401155 | 81.48 | 0.2281 |
| swapOperator(branchRates.categories) | 6598272 | 180768292 | 27.4 | 0.2154 | |
| uniformInteger(branchRates.categories) | 6599381 | 136558073 | 20.69 | 0.3281 |
Operator analysis – fastest1run2
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.829 | 7259872 | 169113074 | 23.29 | 0.2283 |
| scale(ag) | 0.901 | 7261054 | 169072111 | 23.28 | 0.2226 |
| scale(at) | 0.843 | 7262718 | 169158813 | 23.29 | 0.2216 |
| scale(cg) | 0.855 | 7256664 | 169082966 | 23.3 | 0.2304 |
| scale(gt) | 0.833 | 7267287 | 169272563 | 23.29 | 0.2306 |
| frequencies | 0.014 | 7258025 | 169035142 | 23.29 | 0.237 |
| scale(alpha) | 0.705 | 7257684 | 168809560 | 23.26 | 0.2404 |
| scale(ucld.mean) | 0.941 | 21781779 | 506603278 | 23.26 | 0.2095 |
| scale(ucld.stdev) | 0.952 | 21776519 | 412252595 | 18.93 | 0.1948 |
| scale(treeModel.rootHeight) | 0.888 | 21769604 | 32485229 | 1.49 | 0.2224 |
| uniform(nodeHeights(treeModel)) | 217805411 | 852096133 | 3.91 | 0.1454 | |
| scale(birthDeath.meanGrowthRate) | 0.554 | 21782949 | 4248737 | 0.2 | 0.2431 |
| scale(birthDeath.relativeDeathRate) | 0.146 | 21778539 | 4047575 | 0.19 | 0.2591 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.962 | 21778069 | 305910176 | 14.05 | 0.2328 |
| swapOperator(branchRates.categories) | 72601538 | 343235335 | 4.73 | 0.0909 | |
| uniformInteger(branchRates.categories) | 72602288 | 263832702 | 3.63 | 0.1639 |
Operator analysis – fastest2run2
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.819 | 659031 | 89206726 | 135.36 | 0.2316 |
| scale(ag) | 0.857 | 660868 | 89427641 | 135.32 | 0.2107 |
| scale(at) | 0.824 | 659432 | 89315593 | 135.44 | 0.2287 |
| scale(cg) | 0.821 | 659377 | 89299977 | 135.43 | 0.2338 |
| scale(gt) | 0.81 | 662027 | 89636024 | 135.4 | 0.2156 |
| frequencies | 0.015 | 660602 | 89473019 | 135.44 | 0.2348 |
| scale(alpha) | 0.617 | 659716 | 89281998 | 135.33 | 0.2415 |
| scale(ucld.mean) | 0.929 | 1980263 | 268234737 | 135.45 | 0.1957 |
| scale(ucld.stdev) | 0.978 | 1978061 | 146797457 | 74.21 | 0.1177 |
| scale(treeModel.rootHeight) | 0.953 | 1979592 | 13203639 | 6.67 | 0.1856 |
| uniform(nodeHeights(treeModel)) | 19806743 | 438928272 | 22.16 | 0.4735 | |
| scale(birthDeath.meanGrowthRate) | 0.555 | 1979445 | 497254 | 0.25 | 0.2418 |
| scale(birthDeath.relativeDeathRate) | 0.157 | 1976384 | 490952 | 0.25 | 0.2814 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.962 | 1980806 | 161401155 | 81.48 | 0.2281 |
| swapOperator(branchRates.categories) | 6598272 | 180768292 | 27.4 | 0.2154 | |
| uniformInteger(branchRates.categories) | 6599381 | 136558073 | 20.69 | 0.3281 |
Operator analysis – fastest3run2
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.842 | 6599217 | 163846742 | 24.83 | 0.2268 |
| scale(ag) | 0.902 | 6602838 | 163951771 | 24.83 | 0.2182 |
| scale(at) | 0.846 | 6600926 | 163866015 | 24.82 | 0.2285 |
| scale(cg) | 0.855 | 6602336 | 163899948 | 24.82 | 0.2184 |
| scale(gt) | 0.831 | 6601330 | 163881819 | 24.83 | 0.2228 |
| frequencies | 0.014 | 6596966 | 163790684 | 24.83 | 0.2362 |
| scale(alpha) | 0.674 | 6598637 | 163636433 | 24.8 | 0.2413 |
| scale(ucld.mean) | 0.942 | 19804154 | 490972911 | 24.79 | 0.1999 |
| scale(ucld.stdev) | 0.957 | 19793283 | 382899915 | 19.34 | 0.203 |
| scale(treeModel.rootHeight) | 0.883 | 19797648 | 31382714 | 1.59 | 0.2278 |
| uniform(nodeHeights(treeModel)) | 197999369 | 809532330 | 4.09 | 0.1481 | |
| scale(birthDeath.meanGrowthRate) | 0.553 | 19798376 | 3563231 | 0.18 | 0.241 |
| scale(birthDeath.relativeDeathRate) | 0.144 | 19805473 | 3413702 | 0.17 | 0.2545 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.962 | 19797881 | 296985830 | 15.0 | 0.2332 |
| swapOperator(branchRates.categories) | 65993010 | 327338784 | 4.96 | 0.0904 | |
| uniformInteger(branchRates.categories) | 66008556 | 252144290 | 3.82 | 0.1674 |
Tracer effective sample sizes (ESS)
| parameter | fastest1 | fastest2 | fastest3 | largest1 | largest2 | largest3 |
|---|---|---|---|---|---|---|
| likelihood | 3443 | 4600 | 3535 | 77 | 13 | 1984 |
| treeModel.rootHeight | 29 | 61 | 36 | 32 | 75 | 35 |
| tmrca(Aipichthys) | 40 | 96 | 26 | 137 | 475 | 36 |
| tmrca(Berybolcensis) | 204 | 229 | 221 | 57 | 8034 | 259 |
| tmrca(Calatomus) | 28 | 38 | 31 | 59 | 723 | 384 |
| tmrca(Chaetodontidae) | 62 | 92 | 87 | 132 | 8185 | 498 |
| tmrca(Eobuglossus) | 303 | 255 | 193 | 47 | 5676 | 174 |
| tmrca(Eucoelopoma) | 170 | 281 | 196 | 84 | 3149 | 370 |
| tmrca(Homonotichthys) | 29 | 67 | 214 | 133 | 4325 | 437 |
| tmrca(Mcconichthys) | 57 | 282 | 94 | 42 | 9705 | 40 |
| tmrca(Ramphexocoetus) | 167 | 94 | 86 | 158 | 6373 | 238 |
| tmrca(Tarkus) | 1278 | 1671 | 999 | 21 | 3706 | 447 |
| tmrca(Mene) | 156 | 106 | 247 | 313 | 2225 | 565 |
| tmrca(Eastmanelepes) | 318 | 722 | 393 | 660 | 5607 | 3173 |
| birthDeath.meanGrowthRate | 48 | 235 | 87 | 141 | 25 | 246 |
| birthDeath.relativeDeathRate | 9717 | 2151 | 1954 | 22290 | 210 | 35211 |
| ucld.mean | 17 | 37 | 19 | 12 | 28 | 21 |
| ucld.stdev | 102 | 136 | 47 | 12802 | 70 | 16142 |
| meanRate | 19 | 51 | 26 | 19 | 28 | 48 |
| coefficientOfVariation | 53 | 27 | 66 | 62 | 69 | 118 |
| covariance | 55 | 22 | 59 | 3270 | 42771 | 1448 |
| speciation | 18 | 51 | 26 | 19 | 25 | 47 |
Based on these values, three new XLM files were generated using BEAUTi for additional runs. The settings were exactly identical except for the length of the chain, which was set to 550 million generations for fastest1, 450 million generations for fastest2, and 500 million generations for fastest3. The lengths were chosen so that even the parameter with fewest samples would reach an ESS of 200 in both chains, under the conservative assumption that ESS increases linearly with chain length.
Since neither the posterior trace nor the likelihood trace showed a clearly delimited burnin stage when examined in Tracer, the default burnin proportion of 1/10 was reduced to 5 million generations, the same as for the first run. This fraction was then removed from both chains as burnin in the process of combining them using LogCombiner:
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 fastest1.log fastest1-run2.log fastest1-combined.log
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 fastest2.log fastest2-run2.log fastest2-combined.log
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 fastest3.log fastest3-run2.log fastest3-combined.log
The combined files were then re-downloaded and analyzed in Tracer on the local machine in addition to being analyzed on the cluster using LogAnalyser:
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 0 -ess fastest1-combined.log
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 0 -ess fastest2-combined.log
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 0 -ess fastest3-combined.log
fastest1
| parameter | run1 (burnin = 5e6) | run2 (burnin = 5e6) | combined (Tracer) | combined (LogAnalyser) |
|---|---|---|---|---|
| posterior | 38 | 502 | 534 | 534 |
| prior | 22 | 283 | 302 | 302 |
| likelihood | 3443 | 31955 | 35609 | 35609 |
| treeModel.rootHeight | 29 | 305 | 332 | 332 |
| tmrca(Aipichthys) | 40 | 468 | 504 | 123 |
| tmrca(Berybolcensis) | 204 | 1983 | 2196 | 2196 |
| tmrca(Calatomus) | 28 | 277 | 303 | 303 |
| tmrca(Chaetodontidae) | 62 | 757 | 837 | 837 |
| tmrca(Eobuglossus) | 303 | 3018 | 3141 | 3141 |
| tmrca(Eucoelopoma) | 170 | 1712 | 1856 | 1856 |
| tmrca(Homonotichthys) | 29 | 465 | 489 | 489 |
| tmrca(Mcconichthys) | 57 | 673 | 733 | 733 |
| tmrca(Ramphexocoetus) | 167 | 730 | 803 | 803 |
| tmrca(Tarkus) | 1278 | 14121 | 14736 | 14736 |
| tmrca(Mene) | 156 | 1393 | 1557 | 1557 |
| tmrca(Eastmanelepes) | 318 | 5114 | 5366 | 5366 |
| birthDeath.meanGrowthRate | 48 | 584 | 626 | 626 |
| birthDeath.relativeDeathRate | 9717 | 12780 | 14498 | 14498 |
| ucld.mean | 17 | 245 | 262 | 262 |
| ucld.stdev | 102 | 711 | 783 | 783 |
| meanRate | 19 | 241 | 258 | 258 |
| coefficientOfVariation | 53 | 294 | 324 | 324 |
| covariance | 55 | 384 | 429 | 429 |
| speciation | 18 | 241 | 258 | 258 |
fastest2
| parameter | run1 (burnin = 5e6) | run2 (burnin = 5e6) | combined (Tracer) | combined (LogAnalyser) |
|---|---|---|---|---|
| posterior | 146 | 482 | 560 | 560 |
| prior | 66 | 270 | 309 | 309 |
| likelihood | 4600 | 38026 | 42378 | 42378 |
| treeModel.rootHeight | 61 | 256 | 289 | 289 |
| tmrca(Aipichthys) | 96 | 687 | 773 | 773 |
| tmrca(Berybolcensis) | 229 | 2212 | 2477 | 2477 |
| tmrca(Calatomus) | 38 | 309 | 340 | 340 |
| tmrca(Chaetodontidae) | 92 | 933 | 1024 | 1024 |
| tmrca(Eobuglossus) | 255 | 4125 | 4186 | 4186 |
| tmrca(Eucoelopoma) | 281 | 2536 | 2819 | 2819 |
| tmrca(Homonotichthys) | 67 | 603 | 669 | 669 |
| tmrca(Mcconichthys) | 282 | 810 | 936 | 936 |
| tmrca(Ramphexocoetus) | 94 | 696 | 781 | 781 |
| tmrca(Tarkus) | 1671 | 14391 | 15934 | 15934 |
| tmrca(Mene) | 106 | 1476 | 1572 | 1572 |
| tmrca(Eastmanelepes) | 722 | 6860 | 7598 | 7598 |
| birthDeath.meanGrowthRate | 235 | 521 | 608 | 608 |
| birthDeath.relativeDeathRate | 2151 | 12258 | 13525 | 13525 |
| ucld.mean | 37 | 211 | 238 | 2.691E-3 |
| ucld.stdev | 136 | 616 | 712 | 712 |
| meanRate | 51 | 220 | 248 | 2.1962E-3 |
| coefficientOfVariation | 27 | 229 | 249 | 249 |
| covariance | 22 | 437 | 437 | 437 |
| speciation | 51 | 222 | 251 | 251 |
fastest3
| parameter | run1 (burnin = 5e6) | run2 (burnin = 5e6) | combined (Tracer) | combined (LogAnalyser) |
|---|---|---|---|---|
| posterior | 85 | 617 | 670 | 670 |
| prior | 37 | 310 | 336 | 336 |
| likelihood | 3535 | 38339 | 41780 | 41780 |
| treeModel.rootHeight | 36 | 330 | 357 | 357 |
| tmrca(Aipichthys) | 26 | 470 | 481 | 481 |
| tmrca(Berybolcensis) | 221 | 2280 | 2503 | 2503 |
| tmrca(Calatomus) | 31 | 316 | 347 | 347 |
| tmrca(Chaetodontidae) | 87 | 650 | 739 | 739 |
| tmrca(Eobuglossus) | 193 | 1951 | 2169 | 2169 |
| tmrca(Eucoelopoma) | 196 | 1312 | 1451 | 1451 |
| tmrca(Homonotichthys) | 214 | 370 | 384 | 384 |
| tmrca(Mcconichthys) | 94 | 806 | 248 | 248 |
| tmrca(Ramphexocoetus) | 86 | 946 | 1025 | 1025 |
| tmrca(Tarkus) | 999 | 10574 | 11611 | 11611 |
| tmrca(Mene) | 247 | 1348 | 1518 | 1518 |
| tmrca(Eastmanelepes) | 393 | 4983 | 5321 | 5321 |
| birthDeath.meanGrowthRate | 87 | 720 | 778 | 778 |
| birthDeath.relativeDeathRate | 1954 | 15661 | 16300 | 16300 |
| ucld.mean | 19 | 242 | 255 | 255 |
| ucld.stdev | 47 | 600 | 633 | 633 |
| meanRate | 26 | 254 | 275 | 275 |
| coefficientOfVariation | 66 | 270 | 291 | 291 |
| covariance | 59 | 256 | 291 | 291 |
| speciation | 26 | 254 | 275 | 275 |
As can be seen, the ESS values reported by Tracer and LogAnalyser are identical for most parameters. All the parameters for which the results of the two programs differ exhibit effective sample sizes that exceed 200 in Tracer but do not reach this threshold in LogAnalyser. Note that the parameter means are estimated identically by both programs even in these cases. Here, it is assumed that the conflicting values are reported incorrectly by LogAnalyser, particularly with regard to the fact that the ESS values shown for the ucld.mean and meanRate parameters of the fastest2 partition imply unrealistically large autocorrelation times on the order of \(10^{11}\) and are four orders of magnitude lower than the sum of the ESS values obtained from the partial analyses (i.e., in runs 1 and 2).
Therefore, the analyses were regarded as completed, and the respective .trees files from each pair of runs were combined under the same settings as the log files above:
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 -trees fastest1.trees fastest1-run2.trees fastest1-combined.trees
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 -trees fastest2.trees fastest2-run2.trees fastest2-combined.trees
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogCombiner -burnin 5000000 -trees fastest3.trees fastest3-run2.trees fastest3-combined.trees
The results were then summarized using TreeAnnotator:
java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 0 -heights median fastest1-combined.trees fastest1.tre
java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 0 -heights median fastest2-combined.trees fastest2.tre
java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 0 -heights median fastest3-combined.trees fastest3.tre
fastest2 BEAST time tree
Unlike PhyloBayes and mcmcTREE, BEAST does not require the user to place a calibration on the root of the tree. Since no root calibration was included among the 12 fossil calibrations listed and justified in Alfaro et al. (in prep.; see Supplemental Figure S3), this option was made use of, resulting in trees with excessively old deep nodes. To correct for the unrealistic age estimates, a second round of analyses was launched with a root calibration added to the remaining 12. As in the PhyloBayes and mcmcTREE analyses, the lower bound on the root age was set equal to 98 Ma, with the corresponding upper bound at 143 Ma. For the BEAST analyses, the minimum was treated as hard and the maximum as soft, albeit slightly “harder” than the maxima placed on the remaining calibrations. This was achieved by treating it as the 97.5th rather than 95th percentile:
root <- (143 - 98)/qexp(0.975)
paste("Root", root, sep = " = ")
## [1] "Root = 12.1988263806818"
As seen above, the resulting density has the desirable property of being more heavy-tailed than the density assigned to the second oldest calibration in the tree (the Lampris/Aphredoderus node calibrated by Aipichthys). The Lampris/Aphredoderus clade age is therefore relatively more likely to take on values closer to the minimum of 98 Ma, while the root age has a relatively higher probability of being closer to the maximum of 143 Ma. This conforms to the biological requirement that ancestors be older than their descendants.
Aside from the introduction of a root calibration, the only difference between the first and second round of BEAST analyses concerned operator settings. Since ucld.mean and ucld.stdev were among the least well-sampled parameters in the first round of analyses, their operators received an increased tuning setting of 0.9 (default value: 0.75) and an increased weight of 6.0 (default value: 3.0) in the second round. These changes only concern the behavior of the MCMC sampler, and therefore do not affect the comparability of the resulting estimates.
Based on the autocorrelation times observed in the first round of analyses, the length of the chain was set to 500 million generations for all the three analyzed partitions (fastest1, fastest2, fastest3), with the sampling frequency kept at 1 sample per 1000 generations. The analyses were launched using the following commands on Friday 01/27/2017, 8:43 PM (fastest1rootcal), 8:47 PM (fastest2rootcal), 8:49 PM (fastest3rootcal):
beast -threads 12 -beagle_SSE fastest1rootcal.xml
beast -threads 12 -beagle_SSE fastest2rootcal.xml
beast -threads 12 -beagle_SSE fastest3rootcal.xml
The chains reached their target length after 23.749 days (fastest1rootcal), 18.487 days (fastest2rootcal), and 24.822 days (fastest3rootcal), with the operator analyses yielding the results below:
Operator analysis – fastest1rootcal
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.824 | 6116198 | 69919813 | 11.43 | 0.2199 |
| scale(ag) | 0.899 | 6113223 | 69888056 | 11.43 | 0.2172 |
| scale(at) | 0.841 | 6113975 | 69908670 | 11.43 | 0.2194 |
| scale(cg) | 0.858 | 6109667 | 69851372 | 11.43 | 0.2353 |
| scale(gt) | 0.833 | 6112697 | 69886839 | 11.43 | 0.2297 |
| frequencies | 0.014 | 6111604 | 69872193 | 11.43 | 0.2315 |
| scale(alpha) | 0.698 | 6110480 | 69761039 | 11.42 | 0.2322 |
| scale(ucld.mean) | 0.944 | 36669057 | 418799547 | 11.42 | 0.2274 |
| scale(ucld.stdev) | 0.956 | 36666945 | 364960853 | 9.95 | 0.222 |
| scale(treeModel.rootHeight) | 0.921 | 18325612 | 14342138 | 0.78 | 0.2226 |
| uniform(nodeHeights(treeModel)) | 183322777 | 350692465 | 1.91 | 0.1475 | |
| scale(birthDeath.meanGrowthRate) | 0.55 | 18332873 | 2494408 | 0.14 | 0.2376 |
| scale(birthDeath.relativeDeathRate) | 0.143 | 18337848 | 2409742 | 0.13 | 0.2535 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.966 | 18326956 | 127184168 | 6.94 | 0.233 |
| swapOperator(branchRates.categories) | 61106415 | 143560830 | 2.35 | 0.0917 | |
| uniformInteger(branchRates.categories) | 61123673 | 109593515 | 1.79 | 0.1659 |
Operator analysis – fastest2rootcal
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.817 | 6107430 | 52648293 | 8.62 | 0.2244 |
| scale(ag) | 0.889 | 6109772 | 52678774 | 8.62 | 0.2152 |
| scale(at) | 0.831 | 6107465 | 52661821 | 8.62 | 0.219 |
| scale(cg) | 0.85 | 6113335 | 52704317 | 8.62 | 0.2274 |
| scale(gt) | 0.821 | 6114513 | 52715590 | 8.62 | 0.2258 |
| frequencies | 0.015 | 6109295 | 52679348 | 8.62 | 0.2336 |
| scale(alpha) | 0.656 | 6115332 | 52583843 | 8.6 | 0.236 |
| scale(ucld.mean) | 0.941 | 36666356 | 315470882 | 8.6 | 0.2295 |
| scale(ucld.stdev) | 0.952 | 36666448 | 273498416 | 7.46 | 0.2223 |
| scale(treeModel.rootHeight) | 0.912 | 18337128 | 11233196 | 0.61 | 0.2175 |
| uniform(nodeHeights(treeModel)) | 183335452 | 287083670 | 1.57 | 0.1571 | |
| scale(birthDeath.meanGrowthRate) | 0.552 | 18337883 | 2421391 | 0.13 | 0.2407 |
| scale(birthDeath.relativeDeathRate) | 0.149 | 18331103 | 2345343 | 0.13 | 0.265 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.966 | 18337940 | 96210275 | 5.25 | 0.2343 |
| swapOperator(branchRates.categories) | 61108812 | 116351803 | 1.9 | 0.1018 | |
| uniformInteger(branchRates.categories) | 61101736 | 89251300 | 1.46 | 0.1836 |
Operator analysis – fastest3rootcal
| Operator | Tuning | Count | Time | Time/Op | Pr(accept) |
|---|---|---|---|---|---|
| scale(ac) | 0.845 | 6110036 | 71483642 | 11.7 | 0.2317 |
| scale(ag) | 0.902 | 6108468 | 71465827 | 11.7 | 0.217 |
| scale(at) | 0.844 | 6113742 | 71520575 | 11.7 | 0.2247 |
| scale(cg) | 0.853 | 6114443 | 71526662 | 11.7 | 0.2138 |
| scale(gt) | 0.834 | 6113813 | 71512263 | 11.7 | 0.2296 |
| frequencies | 0.014 | 6111004 | 71488987 | 11.7 | 0.2353 |
| scale(alpha) | 0.671 | 6108717 | 71335121 | 11.68 | 0.2385 |
| scale(ucld.mean) | 0.948 | 36662189 | 428234059 | 11.68 | 0.2308 |
| scale(ucld.stdev) | 0.96 | 36666124 | 335433111 | 9.15 | 0.2262 |
| scale(treeModel.rootHeight) | 0.916 | 18343791 | 16323942 | 0.89 | 0.2155 |
| uniform(nodeHeights(treeModel)) | 183319164 | 403954935 | 2.2 | 0.1489 | |
| scale(birthDeath.meanGrowthRate) | 0.552 | 18337065 | 2549700 | 0.14 | 0.2398 |
| scale(birthDeath.relativeDeathRate) | 0.139 | 18326676 | 2460407 | 0.13 | 0.2464 |
| up:ucld.mean down:nodeHeights(treeModel) | 0.966 | 18335384 | 129993472 | 7.09 | 0.2319 |
| swapOperator(branchRates.categories) | 61120866 | 162668944 | 2.66 | 0.0901 | |
| uniformInteger(branchRates.categories) | 61108518 | 126829868 | 2.08 | 0.1672 |
The results were examined in LogAnalyser with the following results:
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 50000000 -ess fastest1rootcal.log
## statistic mean stdErr ESS
## 1 posterior -130727.8863000 15.13010000 607.4022
## 2 prior -1955.6506000 10.43390000 293.9210
## 3 likelihood -128772.2357000 11.12130000 26500.6609
## 4 treeModel.rootHeight 180.9216000 18.34330000 304.8281
## 5 tmrca(Aipichthys) 134.3377000 14.85950000 462.6585
## 6 tmrca(Berybolcensis) 60.2411000 11.17370000 1399.6284
## 7 tmrca(Calatomus) 25.3585000 8.16650000 249.5550
## 8 tmrca(Chaetodontidae) 49.6873000 13.21410000 567.2447
## 9 tmrca(Eastmanelepes) 51.8293000 2.72680000 5125.4621
## 10 tmrca(Eobuglossus) 43.9373000 2.58150000 2286.4395
## 11 tmrca(Eucoelopoma) 57.0837000 2.97570000 988.3860
## 12 tmrca(Homonotichthys) 102.9297000 7.58240000 534.3246
## 13 tmrca(Mcconichthys) 67.9045000 4.47390000 605.2742
## 14 tmrca(Mene) 63.3960000 6.54420000 911.6944
## 15 tmrca(Ramphexocoetus) 60.4203000 8.78960000 700.4769
## 16 tmrca(Tarkus) 50.1588000 1.15460000 6325.0426
## 17 tmrca(Root) 180.9216000 18.34330000 304.8281
## 18 birthDeath.meanGrowthRate 0.0162000 0.00196880 885.6137
## 19 birthDeath.relativeDeathRate 0.0705000 0.06510000 94695.8974
## 20 ac 0.1948000 0.00617880 215900.0000
## 21 ag 0.9825000 0.02830000 81148.0984
## 22 at 0.2465000 0.00746030 167540.0000
## 23 cg 0.2477000 0.00699960 162220.0000
## 24 gt 0.1918000 0.00620960 164810.0000
## 25 frequencies1 0.2191000 0.00307170 113350.0000
## 26 frequencies2 0.2796000 0.00348230 95910.2719
## 27 frequencies3 0.2740000 0.00352670 96100.3316
## 28 frequencies4 0.2273000 0.00307080 115190.0000
## 29 alpha 8.5260000 0.46400000 408560.0000
## 30 ucld.mean 0.0029742 0.00028735 221.1752
## 31 ucld.stdev 0.9491000 0.03470000 706.5774
## 32 meanRate 0.0023827 0.00015022 246.5243
## 33 coefficientOfVariation 1.1973000 0.08460000 270.5419
## 34 covariance 0.1959000 0.06250000 282.6541
## 35 treeLikelihood -128772.2357000 11.12130000 26500.6609
## 36 speciation -596.8042000 7.41600000 246.6509
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 50000000 -ess fastest2rootcal.log
## statistic mean stdErr ESS
## 1 posterior -106365.5556000 15.40380000 522.4109
## 2 prior -1957.5347000 11.08470000 269.3418
## 3 likelihood -104408.0209000 10.99870000 37114.1447
## 4 treeModel.rootHeight 180.7353000 20.05020000 269.2439
## 5 tmrca(Aipichthys) 127.3942000 13.54010000 613.5138
## 6 tmrca(Berybolcensis) 60.0970000 10.90220000 1974.9382
## 7 tmrca(Calatomus) 23.5559000 7.51270000 311.7914
## 8 tmrca(Chaetodontidae) 46.7649000 12.11520000 916.3662
## 9 tmrca(Eastmanelepes) 51.6495000 2.57560000 5502.6245
## 10 tmrca(Eobuglossus) 43.7849000 2.50190000 3151.6457
## 11 tmrca(Eucoelopoma) 57.5543000 3.42390000 1856.2798
## 12 tmrca(Homonotichthys) 102.2346000 7.30590000 644.6914
## 13 tmrca(Mcconichthys) 68.5515000 5.09490000 414.6174
## 14 tmrca(Mene) 63.1729000 6.34290000 1284.0863
## 15 tmrca(Ramphexocoetus) 58.6012000 7.93930000 1061.6739
## 16 tmrca(Tarkus) 50.2542000 1.22650000 12594.1819
## 17 tmrca(Root) 180.7353000 20.05020000 269.2439
## 18 birthDeath.meanGrowthRate 0.0164000 0.00205960 736.8404
## 19 birthDeath.relativeDeathRate 0.0728000 0.06690000 57840.2356
## 20 ac 0.2325000 0.00782330 215610.0000
## 21 ag 0.9843000 0.03020000 86311.0508
## 22 at 0.2831000 0.00914440 170380.0000
## 23 cg 0.3228000 0.00954740 162640.0000
## 24 gt 0.2314000 0.00795450 169400.0000
## 25 frequencies1 0.2226000 0.00330600 123390.0000
## 26 frequencies2 0.2674000 0.00365870 99894.3990
## 27 frequencies3 0.2786000 0.00374410 103710.0000
## 28 frequencies4 0.2313000 0.00338230 122530.0000
## 29 alpha 10.5904000 0.69310000 403880.0000
## 30 ucld.mean 0.0033058 0.00033704 242.3424
## 31 ucld.stdev 0.9459000 0.03710000 508.3728
## 32 meanRate 0.0026357 0.00018346 231.1054
## 33 coefficientOfVariation 1.1671000 0.09060000 242.4046
## 34 covariance 0.1898000 0.06440000 331.9766
## 35 treeLikelihood -104408.0209000 10.99870000 37114.1447
## 36 speciation -595.0462000 8.20300000 229.6069
java -Xmx6000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.LogAnalyser -burnin 50000000 -ess fastest3rootcal.log
As the effective sample size of all parameters exceeded the threshold of 200, the tree distribution was summarized in TreeAnnotator using the following commands:
java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 50000000 -heights median fastest1rootcal.trees fastest1rootcal.tre
java -Xmx20000m -cp /home/analysis/.linuxbrew/lib/beast.jar dr.app.tools.TreeAnnotator -burnin 50000000 -heights median fastest2rootcal.trees fastest2rootcal.tre
fastest1rootcal BEAST time tree
fastest2rootcal BEAST time tree
The BEAST analyses performed in this step differed from those described above in the following features: (1) a root calibration was added to the original calibration set (as in Phase 2); (2) the clock model was changed from the lognormally distributed relaxed clock to random local clocks; and (3) the upper bound of the Homonotichthys calibration was decreased to 116.35 Ma by specifying a new mean for the corresponding offset exponential prior:
## [1] "Homonotichthys = 7.49399410561025"
The analysis was run under the fixed topology operator setup, all of whose components were kept at their default values. The chain length was set equal to 500 million generations, with a sampling frequency of 1 sample per 1000 generations.
beast -threads 12 -beagle_SSE fastest1rlc.xml
beast -threads 12 -beagle_SSE fastest2rlc.xml
beast -threads 12 -beagle_SSE fastest3rlc.xml
As noted in the mcmcTREE manual, several prior settings are of special interest when creating the control file: RootAge, rgene_gamma, and sigma2_gamma. Following Alfaro et al., the following prior was placed on the age of the root:
RootAge = 'B(9.8, 14.3, 1e-300, 0.05)'
To compute rgene_gamma, the baseml program from the PAML package was used to infer a mean substitution rate under the assumption of a strict clock and an age of 113.02 Ma for the root. This age was obtained as the mean of an exponential distribution with an offset of 98 Ma and the 95th percentile equal to 143 Ma:
mean <- 98 + (143 - 98)/qexp(0.95)
mean
## [1] 113.0214
A modified version of the tree file was used for the baseml analysis, with a single point calibration assigned to the root, in order to arrive at an estimate of the absolute substitution rate:
118 1
(((lampris_guttatus,(zu_elongatus,regalecus_glesne)),((polymixia_lowei,(percopsis_omiscomycus,aphredoderus_sayanus)),((zeus_faber,zenopsis_conchifera),(stylephorus_chordatus,(coryphaenoides_rupestris,gadus_morhua))))),(anoplogaster_cornuta,((rondeletia_loricata,(sargocentron_coruscum2,(myripristis_violacea,myripristis_leiognathus))),((carapus_bermudensis,lepophidium_profundorum),(batrachoides_pacifici,((kurtus_gulliveri,(((apogon_lateralis,ostorhinchus_nigrofasciatus),(apogon_maculatus,phaeoptyx_pigmentaria)),(odontobutis_obscura,((mogurnda_adspersa,erotelis_smaragdus),(ophiocara_porocephala,((bathygobius_soporator,(valenciennea_strigata,ptereleotris_evides)),(gillichthys_seta,(evorthodus_minutus,periophthalmus_barbarus)))))))),((((syngnathus_fuscus,aulostomus_sp),(synchiropus_ocellatus,(dactyloptena_orientalis,(parupeneus_multifasciatus,pseudupeneus_maculatus)))),((chiasmodon_niger,kali_normani),((scomber_scombrus,scomberomorus_maculatus),(((taractes_asper,brama_japonica),(ruvettus_pretiosus,assurger_anzac)),(psenopsis_anomala,(pomatomus_saltatrix,(cubiceps_baxteri,psenes_cyanophrys))))))),((((xenentodon_cancila,(pholidichthys_leucotaenia,pterophyllum_leopoldi)),((plesiops_sp,(pseudochromis_flavivertex,pseudochromis_sankeyi)),(chromis_enchrysura,(gramma_loreto,(scartella_cristata,(labrisomus_nuchipinnis,emblemariopsis_sp)))))),((centropomus_medius,sphyraena_putnamae),((polydactylus_sexfilis,(citharoides_macrolepis,((hippoglossina_oblonga,bothus_pantherinus),(parachirus_xenicus,poecilopsetta_plinthus)))),((toxotes_jaculatrix,(istiophorus_platypterus,mene_maculatus)),((coryphaena_hippurus,rachycentron_canadum),(seriola_zonata,(alectis_indicus,(caranx_melampygus,(chloroscombrus_orqueta,alepes_kleinii))))))))),((cephalopholis_argus,(hypoplectrus_puela,(taenianotus_triacanthus,pseudanthias_squamipinnis))),(micropterus_salmoides,((pempheris_schomburgkii,(cheimarrichthys_fosteri,((thalassoma_ballieui,(halichoeres_poeyi,halichoeres_radiatus)),(epibulus_insidiator,(scarus_trispinosus,scarus_ferrigineus))))),(((pareques_acuminatus,micropogonias_undulatus),(lutjanus_goreensis,(haemulon_album,(haemulon_flavilineatum,haemulon_parra)))),(((pomacanthus_paru,(chaetodon_ocellatus,chaetodon_kleinii)),(naso_unicornis,(acanthurus_pyroferus,(acanthurus_bahianus,(acanthurus_bariene,acanthurus_olivaceus))))),(antigonia_capros,((antennarius_striatus,(ogcocephalus_radiatus,cryptopsaras_couesii)),(balistes_capriscus,(takifugu_occelatus,(tetraodon_suvatti,(canthigaster_margaritata,(canthigaster_rostrata,canthigaster_figueiredoi))))))))))))))))))))'@1.1302';
Baseml estimates the following substitution rates (per 1 time unit, i.e., 100 million years):
| partition | fastest1 | largest1 |
|---|---|---|
| rate | 0.427164 | 0.023614 |
These values compare well to Alfaro et al.’s estimate of 0.098170 for the entire dataset, since fastest1 and largest1 represent unusually fast and unusually slow (respectively) subsets thereof.
Based on the mcmcTREE manual and following Alfaro et al., \(\alpha\) (the shape parameter of the gamma distribution) was set to 2 and \(\beta\) (rate parameter) was chosen so that the mean \(\frac{\alpha}{\beta}\) was equal to the respective baseml estimate rescaled per 10 million (rather than 100 million) years:
## [1] "beta_fastest1 = 46.8204249421768"
## [1] "beta_largest1 = 846.955196070128"
| parameter | fastest1 | largest1 |
|---|---|---|
| \(\alpha\) (shape) | 2 | 2 |
| \(\beta\) (rate) | 46.82 | 846.96 |
corresponding to the following priors:
Note that the sigma2_gamma prior employed here is highly consistent with the ucld.stdev prior selected for the BEAST analyses (a truncated exponential with a mean of 0.3). If the standard deviation of the logarithm of the rate is to be 0.3, the variation of the logarithm of the rate must be set to 0.09. Therefore, the mean \(\left(\frac{\alpha}{\beta}\right)\) of the gamma prior on log rate variance should be equal to 0.09. Since a gamma distribution reduces to an exponential when \(\alpha\) is set to 1, one can approximate the BEAST prior by setting \(\alpha\) to 1 and \(\beta\) to 1/0.09, which is here rounded to 10.
Since the independent rates of the relaxed clock model are drawn from a lognormal distribution whose mean \(\mu\) and variance \(\sigma^2\) are determined by the rgene_gamma and sigma2_gamma hyperpriors, the PAML manual recommends plotting this distribution to make sure it is reasonable:
\[ \begin{align*} \mu &\sim \Gamma\left(\alpha_1,\beta_1\right) & \mathbb{E}[\mu] &= \frac{\alpha_1}{\beta_1} & \mathrm{Var}[\mu] &= \frac{\alpha_1}{{\beta_1}^2} \\ \sigma^2 &\sim \Gamma\left(\alpha_2,\beta_2\right) & \mathbb{E}[\sigma^2] &= \frac{\alpha_2}{{\beta_2}} & \mathrm{Var}[\sigma^2] &= \frac{\alpha_2}{{\beta_2}^2} \end{align*} \]
For fastest1, this corresponds to:
alpha_1 <- 2
beta_1 <- 46.82
alpha_2 <- 1
beta_2 <- 10
## Note that while mu is the mean substitution rate, sigma squared is the variance
## of the logarithm of the rate rather than simply the variance of the rate.
## Note also that since alpha_2/beta_2 gives the prior mean of the variance, the
## prior mean of the corresponding st. dev. is equal to the square root of the ratio.
curve(dlnorm(x, meanlog = log(alpha_1/beta_1), sdlog = sqrt(alpha_2/beta_2)), from=0, to=1,
main = "Relaxed clock rate distribution", xlab = "Rate", ylab = "Probability density")
The prior choices described above were implemented in the following control file (for fastest1; the control file for largest1 only differed in the values of rgene_gamma):
seed = -1
seqfile = 3ad39b9161154796bf2aef1aa746b469.phy
treefile = 12_cali_no_outgroups_scheme_3.tre
outfile = fastest1.txt
ndata = 1 * No further partitioning; each run corresponds to 1 partition
seqtype = 0 * Data type: nucleotides
usedata = 3 * Store the Hessian matrix for approximate likelihood computation in out.BV
clock = 2 * Independent rates
RootAge = 'B(9.8, 14.3, 1e-300, 0.05)' * P of being younger than 98 Ma = 10^(-300) and P of being older than 143 Ma = 0.05
model = 4 * Most complex model available
alpha = 0.1 * Following Alfaro et al.
ncatG = 8 * Following Alfaro et al.
cleandata = 0 * Do not remove sites with ambiguity
BDparas = 0.1 0.1 00.1 * Birth, death, sampling
kappa_gamma = 6 2 * No effect since usedata = 3
alpha_gamma = 1 1 * No effect since usedata = 3
rgene_gamma = 2 46.82 * Gamma prior on substitution rate: estimated using baseml under strict clock
sigma2_gamma = 1 10 * Gamma prior on log rate variance: independent of time unit choice since clock = 2
finetune = 1: .1 .1 .1 .1 .1 .1 * Auto finetune: times, musigma2, rates, mixing, paras, FossilErr
print = 2 * Print branch rates into an output file
burnin = 500000 * Following Alfaro et al.
sampfreq = 500 * Following Alfaro et al.
nsample = 15000 * Following Alfaro et al.
The mcmcTREE analyses were run as follows:
mcmctree fastest1mcmctree.ctl
mcmctree largest1mcmctree.ctl
The Hessian matrix was generated and stored in out.BV files (fastest1: time used = 4:29, largest1: time used = 36:58). These were renamed as in.BV, with the control files modified in vi to read as follows: usedata = 2. The analyses were then started again using the commands above, and finished in 70:29:15 (fastest1) and 67:28:12 (largest1), yielding the following trees:
fastest1 PAML time tree
largest1 PAML time tree
Given the great disparity between the results of the two analyses, and their relatively fast run times, the remaining four partitions were subjected to the same sequence of analyses as outlined above. The following rate estimates were obtained using baseml:
| partition | fastest2 | fastest3 | largest2 | largest3 |
|---|---|---|---|---|
| rate | 0.056057 | 0.372665 | 0.000100 | 0.029088 |
After rescaling, these can be used to calculate the hyperparameters of the rgene_gamma prior:
## [1] "beta_fastest2 = 356.779706370302"
## [1] "beta_fastest3 = 53.6675029852548"
## [1] "beta_largest2 = 200000"
## [1] "beta_largest3 = 687.568756875688"
| parameter | fastest2 | fastest3 | largest2 | largest3 |
|---|---|---|---|---|
| \(\alpha\) (shape) | 2 | 2 | 2 | 2 |
| \(\beta\) (rate) | 356.78 | 53.67 | 200000 | 687.57 |
This yields the following prior distributions:
Note that a different x-axis scale was used for the largest2 prior, whose probability mass is concentrated into an extremely short interval close to 0.
New control files were created for fastest2 and fastest3 with settings identical to those in the first two files, except for the different values of the rgene_gamma \(\beta\) hyperparameter. The sigma2_gamma hyperparameters were left unmodified, with \(\sigma^2 \sim \Gamma\left(\alpha = 1, \beta = 10\right)\), except for the “largest2” partition, for which this setting would result in an extremely informative rate prior favoring values very close to 0. The sigma2_gamma hyperparameters for largest2 (\(\alpha = 1\), \(\beta = 0.1\)) were chosen so as to make the resulting lognormal density roughly comparable in shape to those used for the analyses of the other partitions:
After calculating the Hessian matrices (fastest2: time used = 3:34, fastest3: time used = 4:10, largest3: time used = 15:06), new analyses were started with usedata set to 2. The only exception was the “largest2” partition, whose baseml analysis finished with the following warning message: xmax = 0.0000e+00 close to zero at 235!. Given that this message was displayed in repeated trials, no attempt was made to proceed with mcmcTREE analysis. The analyses of the remaining partitions finished in 71:36:39 (fastest2), 71:14:32 (fastest3), and 49:52:29 (largest3) with the following results:
fastest2 PAML time tree
fastest3 PAML time tree
largest3 PAML time tree
A new set of mcmcTREE analyses was run on the “fastest1”, “fastest2”, “fastest3”, “largest1”, and “largest3” partitions under the same extended set of calibrations used in Phase 4 of the PhyloBayes analyses. These were incorporated into the following tree file:
118 1
(((lampris_guttatus,(zu_elongatus,regalecus_glesne)'B(0.533,9.4,1e-300,0.05)')'B(5.417,12.25,1e-300,0.05)',((polymixia_lowei,(percopsis_omiscomycus,aphredoderus_sayanus)'B(5.8551,10.96,1e-300,0.05)')'B(9.39,12.8,1e-300,0.05)',((zeus_faber,zenopsis_conchifera)'B(3.202,9.14,1e-300,0.05)',(stylephorus_chordatus,(coryphaenoides_rupestris,gadus_morhua)'B(5.37,9.44,1e-300,0.05)'))'B(6.971,11.13,1e-300,0.05)'))'B(9.8,14.3,1e-300,0.05)',(anoplogaster_cornuta,((rondeletia_loricata,(sargocentron_coruscum2,(myripristis_violacea,myripristis_leiognathus))'B(4.9,9.37,1e-300,0.05)'),((carapus_bermudensis,lepophidium_profundorum),(batrachoides_pacifici,((kurtus_gulliveri,(((apogon_lateralis,ostorhinchus_nigrofasciatus),(apogon_maculatus,phaeoptyx_pigmentaria)),(odontobutis_obscura,((mogurnda_adspersa,erotelis_smaragdus),(ophiocara_porocephala,((bathygobius_soporator,(valenciennea_strigata,ptereleotris_evides)),(gillichthys_seta,(evorthodus_minutus,periophthalmus_barbarus)))))))),((((syngnathus_fuscus,aulostomus_sp)'B(4.9,9.37,1e-300,0.05)',(synchiropus_ocellatus,(dactyloptena_orientalis,(parupeneus_multifasciatus,pseudupeneus_maculatus))))'B(6.971,11.13,1e-300,0.05)',((chiasmodon_niger,kali_normani),((scomber_scombrus,scomberomorus_maculatus)'B(5.417,9.42,1e-300,0.05)',(((taractes_asper,brama_japonica),(ruvettus_pretiosus,assurger_anzac)),(psenopsis_anomala,(pomatomus_saltatrix,(cubiceps_baxteri,psenes_cyanophrys))))))),((((xenentodon_cancila,(pholidichthys_leucotaenia,pterophyllum_leopoldi)'B(4.546,6.74,1e-300,0.05)')'B(4.9,7.97,1e-300,0.05)',((plesiops_sp,(pseudochromis_flavivertex,pseudochromis_sankeyi)),(chromis_enchrysura,(gramma_loreto,(scartella_cristata,(labrisomus_nuchipinnis,emblemariopsis_sp)))))),((centropomus_medius,sphyraena_putnamae)'B(4.9,7.97,1e-300,0.05)',((polydactylus_sexfilis,(citharoides_macrolepis,((hippoglossina_oblonga,bothus_pantherinus),(parachirus_xenicus,poecilopsetta_plinthus)'B(4.12,6.66,1e-300,0.05)'))'B(4.9,7.97,1e-300,0.05)'),((toxotes_jaculatrix,(istiophorus_platypterus,mene_maculatus)'B(5.52,9.46,1e-300,0.05)'),((coryphaena_hippurus,rachycentron_canadum),(seriola_zonata,(alectis_indicus,(caranx_melampygus,(chloroscombrus_orqueta,alepes_kleinii))))'B(4.9,6.93,1e-300,0.05)')'B(5.417,8.08,1e-300,0.05)')))),((cephalopholis_argus,(hypoplectrus_puela,(taenianotus_triacanthus,pseudanthias_squamipinnis))),(micropterus_salmoides,((pempheris_schomburgkii,(cheimarrichthys_fosteri,((thalassoma_ballieui,(halichoeres_poeyi,halichoeres_radiatus)),(epibulus_insidiator,(scarus_trispinosus,scarus_ferrigineus))'B(1.19,6.33,1e-300,0.05)'))),(((pareques_acuminatus,micropogonias_undulatus),(lutjanus_goreensis,(haemulon_album,(haemulon_flavilineatum,haemulon_parra)))),(((pomacanthus_paru,(chaetodon_ocellatus,chaetodon_kleinii))'B(2.962,6.6,1e-300,0.05)',(naso_unicornis,(acanthurus_pyroferus,(acanthurus_bahianus,(acanthurus_bariene,acanthurus_olivaceus))))'B(4.9,6.93,1e-300,0.05)')'B(5.417,8.08,1e-300,0.05)',(antigonia_capros,((antennarius_striatus,(ogcocephalus_radiatus,cryptopsaras_couesii)'B(4.9,6.16,1e-300,0.05)'),(balistes_capriscus,(takifugu_occelatus,(tetraodon_suvatti,(canthigaster_margaritata,(canthigaster_rostrata,canthigaster_figueiredoi))))'B(3.202,5.85,1e-300,0.05)')'B(5.05,6.96,1e-300,0.05)'))))))))))))))'B(9.39,12.8,1e-300,0.05)')'B(9.8,14.3,1e-300,0.05)';
All other settings were identical to those used for the first round of mcmcTREE analyses, including the rgene_gamma and sigma2_gamma hyperprior values. However, the RootAge line was deleted from the control file, as the corresponding calibration density (a uniform density between 98 and 143 Ma, with soft bounds such that P(<98) = \(10^{-300}\) and P(>143) = 0.05) was already included in the tree file. This produced the resulting control file (fastest1 file shown as an example):
seed = -1
seqfile = 3ad39b9161154796bf2aef1aa746b469.phy
treefile = 12_extended_cali_no_outgroups.tre
outfile = fastest1ext.txt
ndata = 1 * No further partitioning; each run corresponds to 1 partition
seqtype = 0 * Data type: nucleotides
usedata = 3 * Store the Hessian matrix for approximate likelihood computation in out.BV
clock = 2 * Independent rates
model = 4 * Most complex model available
alpha = 0.1 * Following Alfaro et al.
ncatG = 8 * Following Alfaro et al.
cleandata = 0 * Do not remove sites with ambiguity
BDparas = 0.1 0.1 00.1 * Birth, death, sampling
kappa_gamma = 6 2 * No effect since usedata = 3
alpha_gamma = 1 1 * No effect since usedata = 3
rgene_gamma = 2 46.82 * Gamma prior on substitution rate: estimated using baseml under strict clock
sigma2_gamma = 1 10 * Gamma prior on log rate variance: independent of time unit choice since clock = 2
finetune = 1: .1 .1 .1 .1 .1 .1 * Auto finetune: times, musigma2, rates, mixing, paras, FossilErr
print = 2 * Print branch rates into an output file
burnin = 500000 * Following Alfaro et al.
sampfreq = 500 * Following Alfaro et al.
nsample = 15000 * Following Alfaro et al.
The same procedure was used as in the first round of analyses, with the usedata value changed from 3 to 2 after generating the Hessian matrix, which records the second derivatives of the likelihood function at its maximum values. The out.BV file, in which the matrix was stored, was renamed as in.BV and served as an input for the second stage of the analysis (node age estimation proper).
The analyses finished in 39:32:49 (fastest1ext), 39:29:45 (fastest2ext), 39:24:17 (fastest3ext), and 36:33:56 (largest1ext; duration of the largest3ext analysis was not recorded) with the following results:
fastest1ext PAML time tree
fastest2ext PAML time tree
fastest3ext PAML time tree
largest1ext PAML time tree
largest3ext PAML time tree
All of the mcmcTREE analyses above were performed with piecewise calibration densities that consisted of (1) a power-decay function left of the lower bound, (2) a uniform density between the lower and upper bounds, and (3) an exponential-decay function right of the upper bound. Different portions of the total probability mass can be allocated to these 3 densities: in the tree and control files shown above, these fractions were set equal to \(P = 10^{-300}\) for (1) and \(P =\) 0.05 for (3) to make the lower bounds as “hard” as possible while keeping the upper bounds relatively soft. However, it has been noted that calibrated divergences are generally more likely to take place shortly before the occurrence of the calibration fossil than to predate it by a large margin (Brown & van Tuinen 2011), suggesting that the uniform distribution might not be the best description of the uncertainty associated with divergence ages.
Similar to the PhyloBayes analyses described above, the mcmcTREE analyses were re-run under a corrected version of the original guide tree, in which the upper bound of the Homonotichthys calibration placed on the (Polymixia + Aphredoderus) node was decreased from 128.04 Ma to 116.35 Ma. All other settings were retained:
mcmctree fastest1corrmcmctree.ctl
mcmctree fastest2corrmcmctree.ctl
mcmctree fastest3corrmcmctree.ctl
mcmctree largest1corrmcmctree.ctl
mcmctree largest3corrmcmctree.ctl
The preliminary analyses produced the output in the form of a Hessian matrix in 2:32 (fastest1corr), 2:05 (fastest2corr), 2:11 (fastest3corr), 19:09 (largest1corr), and 11:52 (largest3corr), respectively. The matrix file were renamed from out.BV to in.BV and new mcmcTREE analyses were launched based on modified versions of the original control files, in which the value of usedata was changed from 3 to 2. These analyses finished in 40:18:24 (fastest1corr), 40:17:43 (fastest2corr), 38:56:28 (fastest3corr), 36:03:15 (largest1corr), and 37:21:11 (largest3corr), producing the results shown below:
fastest1corr PAML time tree
fastest2corr PAML time tree
fastest3corr PAML time tree
largest1corr PAML time tree
largest3corr PAML time tree
The partitions used for the analyses above seem to show a systematic bias: the fastest partitions tend to allocate evolutionary change to individual branches as evenly as possible, showing no rapid radiations. On the other hand, the largest partitions, which are also the slowest ones, tend to produce trees with explosive diversification bursts that are unrealistically comprehensive, often subsuming even intrageneric divergences that would be expected to be an order of magnitude younger (see PhyloBayes tree largest3ext and PAML trees largest3, largest1ext, and largest3ext above; note the ages of Acanthurus, Haemulon, Pseudochromis, and Scarus).
Therefore, it might be useful to infer time trees from partitions whose rates are neither extremely slow nor extremely fast, in order to see if they can provide more biologically realistic estimates that could detect rapid radiations at multiple time scales. Three such partitions were selected based on the absolute deviation from the median rate multiplier:
acanth <- read.delim("/Users/David/Grive/Alfaro_Lab/PartitionFinder/k-means/Acanthomorpha_Run_2/Rate_multipliers_Acanthomorpha.txt", header=TRUE)
colnames(acanth) <- c("Subset_ID", "Rate_multiplier", "Sites")
# Find the absolute deviation from the median
acanth$Diff_from_median <- 0
for(i in 1:nrow(acanth)) {
acanth[i,"Diff_from_median"] <- abs(acanth[i,"Rate_multiplier"] - median(acanth$Rate_multiplier))
}
# Select the 3 partitions with rates closest to the median
head(acanth[order(acanth$Diff_from_median),], n=3)
## Subset_ID Rate_multiplier Sites Diff_from_median
## 23 2d1d94d7d53bf577c912ae0b72fe2274 1.421441 6442 0.05932
## 12 95d43d8ebf63945dc31116a80872c4ed 1.302801 5837 0.05932
## 27 f495e4e0f2f9bbbf091f778067b062f4 1.153401 6752 0.20872
The following preliminary (uncalibrated) PhyloBayes analyses were run in order to determine the hyperparameters of the birth-death process that is to be used as a branching-process prior in the subsequent calibrated runs:
../pb -d 2d1d94d7d53bf577c912ae0b72fe2274.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 median1nocal
../pb -d 95d43d8ebf63945dc31116a80872c4ed.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 median2nocal
../pb -d f495e4e0f2f9bbbf091f778067b062f4.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 median3nocal
The uncalibrated PhyloBayes runs were soft-stopped on Tuesday 02/07/2017, 0:56 AM (median1nocal, at 24665 generations), 00:58 AM (median2nocal, at 26673 generations), and 00:59 AM (median3nocal, at 23814 generations). The respective ESS values are shown in the table below:
| parameter | median1 | median2 | median3 |
|---|---|---|---|
| states | 24665 | 26673 | 23814 |
| loglik | 16074 | 17514 | 15944 |
| length | 12975 | 14133 | 14012 |
| sigma | 12107 | 11560 | 8863 |
| mu | 1130 | 1742 | 1889 |
| meanrates | 4357 | 6111 | 4592 |
| p1 | 11488 | 14089 | 13425 |
| p2 | 11732 | 19865 | 17218 |
| alpha | 17900 | 20523 | 17721 |
| nmode | - | - | - |
| stat | 9578 | 10150 | 10546 |
| statalpha | - | - | - |
| rrent | 12039 | 12712 | 12122 |
| meanrr | 1491 | 1828 | 1540 |
| kappa | - | - | - |
| allocent | - | - | - |
The parameters of interest – i.e., p1 and p2 – were found using the following commands. Note that the greatest autocorrelation times detected in the three analyses were 19.65 (median1: mu), 13.78 (median2: mu), and 13.92 (median3: meanrr), making the subsampling frequency applied in readdiv conservative.
../readdiv -x 2467 50 median1nocal
../readdiv -x 2667 50 median2nocal
../readdiv -x 2381 50 median3nocal
Resulting in the following estimates of the parameters of interest:
| chain | p1 | p2 | ESS |
|---|---|---|---|
| median1 | 0.00120075 +/- 0.000456566 | 9.00163e-05 +/- 7.51772e-05 | 443 |
| median2 | 0.0016755 +/- 0.000425401 | 5.38323e-05 +/- 5.2924e-05 | 480 |
| median3 | 0.00176085 +/- 0.000419111 | 4.24065e-05 +/- 3.91027e-05 | 428 |
Based on these results, calibrated analyses were launched using the corrected calibration file:
../pb -d 2d1d94d7d53bf577c912ae0b72fe2274.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00120075 0.0000900163 -cal calib_root_corrected -sb -gtr -dgam 4 median1cal
../pb -d 95d43d8ebf63945dc31116a80872c4ed.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.0016755 0.0000538323 -cal calib_root_corrected -sb -gtr -dgam 4 median2cal
../pb -d f495e4e0f2f9bbbf091f778067b062f4.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00176085 0.0000424065 -cal calib_root_corrected -sb -gtr -dgam 4 median3cal
The calibrated analyses were soft-stopped on 02/11/2017 at 11:38 PM (median1cal: 10826 generations) and on 02/12/2017 at 4:24 PM (median2cal: 14019 generations) and 4:27 PM (median3cal: 12654 generations). The following ESS values were obtained using Tracer:
| parameter | median1 | median2 | median3 |
|---|---|---|---|
| states | 10826 | 14019 | 12654 |
| loglik | 5836 | 6916 | 6427 |
| length | 5995 | 6648 | 6541 |
| sigma | 4452 | 7594 | 5470 |
| mu | 259 | 860 | 664 |
| meanrates | 503 | 579 | 224 |
| p1 | - | - | - |
| p2 | - | - | - |
| scale | 343 | 482 | 1309 |
| alpha | 8385 | 10124 | 9464 |
| nmode | - | - | - |
| stat | 4695 | 5952 | 5944 |
| statalpha | - | - | - |
| rrent | 5511 | 6854 | 6563 |
| meanrr | 819 | 955 | 710 |
| kappa | - | - | - |
| allocent | - | - | - |
Based on the greatest recorded autocorrelation times of 37.59 (median1: mu), 26.18 (median2: scale), and 50.85 (median3: meanrates), readdiv was run as follows:
../readdiv -x 1083 50 median1cal
../readdiv -x 1402 50 median2cal
../readdiv -x 1265 55 median3cal
Producing the trees below:
median1 PhyloBayes time tree
median2 PhyloBayes time tree
median3 PhyloBayes time tree
The posterior mean ages listed in the corresponding _sample.dates were examined to detect possible underflows (bold) and overflows (italics) with regard to the specified calibrations:
| Calibration | Lower bound | Upper bound | Node number | median1 | median2 | median3 |
|---|---|---|---|---|---|---|
| Aipichthys | 98.0 | 128.82 | 119 | 117.529 | 109.728 | 111.156 |
| Berybolcensis | 49.0 | 109.29 | 132 | 76.1029 | 65.009 | 70.1035 |
| Calatomus | 11.9 | 43.95 | 210 | 38.0123 | 38.3904 | 40.3444 |
| Chaetodontidae | 29.62 | 59.26 | 220 | 57.9509 | 58.2777 | 57.9137 |
| Eobuglossus | 41.2 | 53.88 | 189 | 50.1875 | 50.5963 | 49.0592 |
| Eucoelopoma | 54.17 | 95.58 | 162 | 84.971 | 65.6994 | 74.1213 |
| Homonotichthys | 93.9 | 116.35 | 123 | 108.916 | 99.1346 | 100.816 |
| Mcconichthys | 63.1 | 93.51 | 124 | 41.6338 | 42.3599 | 43.1568 |
| Ramphexocoetus | 49.0 | 80.52 | 173 | 76.3036 | 78.6618 | 78.9618 |
| Tarkus | 49.0 | 53.93 | 229 | 51.0414 | 51.9401 | 51.7883 |
| Mene | 55.2 | 95.64 | 192 | 93.8209 | 91.1315 | 90.3707 |
| Eastmanelepes | 49.0 | 61.61 | 195 | 58.7614 | 58.2695 | 57.9235 |
More detailed information about the underflows and overflows is available in the corresponding _sample.calib files.
As in the previously used partitions, the rgene_gamma prior was computed by running the baseml program from the PAML package on a tree whose root was calibrated by a point age of 113.02 Ma. All the settings specified in the baseml control files were retained.
The baseml analyses finished in 26:51:37 (median1), 9:51:01 (median2), and 28:04:54 (median3), with the following substitution rate estimates (per a time unit of 100 million years):
| partition | median1 | median2 | median3 |
|---|---|---|---|
| rate | 0.154514 | 0.141265 | 0.128926 |
These were used to determine the values of the rgene_gamma prior in the following way (assuming an \(\alpha\) value of 2):
## [1] "beta_median1 = 129.438109168101"
## [1] "beta_median2 = 141.57788553428"
## [1] "beta_median3 = 155.127747700231"
Resulting in the following estimates of \(\beta\):
| parameter | median1 | median2 | median3 |
|---|---|---|---|
| \(\alpha\) (shape) | 2 | 2 | 2 |
| \(\beta\) (rate) | 129.44 | 141.58 | 155.13 |
The sigma2_gamma prior was identical to that used in the previous analyses (i.e., a gamma distribution with the shape parameter equal to 1 and the rate parameter equal to 10). The priors placed on the mean substitution rate of each partition are shown below, as are the overall lognormal rate distribution whose mean and variance are determined by the means of the respective hyperpriors (i.e., by the means of rgene_gamma and sigma2_gamma, respectively):
These values of rgene_gamma and sigma2_gamma were subsequently incorporated into a new set of mcmcTREE control files, such as the one shown below (for median1). Note that all of the analyses were run on the corrected tree, for which the soft upper bound on the Homonotichthys calibration was lowered to 116.35 Ma.
seed = -1
seqfile = 2d1d94d7d53bf577c912ae0b72fe2274.phy
treefile = 12_cali_no_outgroups_corrected.tre
outfile = median1.txt
ndata = 1 * No further partitioning; each run corresponds to 1 partition
seqtype = 0 * Data type: nucleotides
usedata = 3 * Store the Hessian matrix for approximate likelihood computation in out.BV
clock = 2 * Independent rates
RootAge = 'B(9.8, 14.3, 1e-300, 0.05)' * P of being younger than 98 Ma = 10^(-300) and P of being older than 143 Ma = 0.05
model = 4 * Most complex model available
alpha = 0.1 * Following Alfaro et al.
ncatG = 8 * Following Alfaro et al.
cleandata = 0 * Do not remove sites with ambiguity
BDparas = 0.1 0.1 00.1 * Birth, death, sampling
kappa_gamma = 6 2 * No effect since usedata = 3
alpha_gamma = 1 1 * No effect since usedata = 3
rgene_gamma = 2 129.44 * Gamma prior on substitution rate: estimated using baseml under strict clock
sigma2_gamma = 1 10 * Gamma prior on log rate variance: independent of time unit choice since clock = 2
finetune = 1: .1 .1 .1 .1 .1 .1 * Auto finetune: times, musigma2, rates, mixing, paras, FossilErr
print = 2 * Print branch rates into an output file
burnin = 500000 * Following Alfaro et al.
sampfreq = 500 * Following Alfaro et al.
nsample = 15000 * Following Alfaro et al.
The mcmcTREE analyses were run as follows:
mcmctree median1mcmctree.ctl
mcmctree median2mcmctree.ctl
mcmctree median3mcmctree.ctl
The Hessian matrices of the likelihood functions were generated in 4:55 (median1), 1:01:07 (median2), and 14:15 (median3). The respective files were renamed as in.BV and the “usedata” line was set to a value of 2 in in the original control files. The analyses were then re-launched using commands identical to those shown above and finished in 39:47:21 (median1), 39:39:56 (median2), and 39:28:41 (median3) with the following results:
median1 PAML time tree
median2 PAML time tree
median3 PAML time tree
All 3 partitions were then re-analyzed under the same prior settings, but with the extended list of calibrations. Note that the guide tree specifying the extended set of calibrations also misspecified the upper bound on the (Polymixia + Aphredoderus) node, and had to be corrected:
118 1
(((lampris_guttatus,(zu_elongatus,regalecus_glesne)'B(0.533,9.4,1e-300,0.05)')'B(5.417,12.25,1e-300,0.05)',((polymixia_lowei,(percopsis_omiscomycus,aphredoderus_sayanus)'B(5.8551,10.96,1e-300,0.05)')'B(9.39,11.635,1e-300,0.05)',((zeus_faber,zenopsis_conchifera)'B(3.202,9.14,1e-300,0.05)',(stylephorus_chordatus,(coryphaenoides_rupestris,gadus_morhua)'B(5.37,9.44,1e-300,0.05)'))'B(6.971,11.13,1e-300,0.05)'))'B(9.8,14.3,1e-300,0.05)',(anoplogaster_cornuta,((rondeletia_loricata,(sargocentron_coruscum2,(myripristis_violacea,myripristis_leiognathus))'B(4.9,9.37,1e-300,0.05)'),((carapus_bermudensis,lepophidium_profundorum),(batrachoides_pacifici,((kurtus_gulliveri,(((apogon_lateralis,ostorhinchus_nigrofasciatus),(apogon_maculatus,phaeoptyx_pigmentaria)),(odontobutis_obscura,((mogurnda_adspersa,erotelis_smaragdus),(ophiocara_porocephala,((bathygobius_soporator,(valenciennea_strigata,ptereleotris_evides)),(gillichthys_seta,(evorthodus_minutus,periophthalmus_barbarus)))))))),((((syngnathus_fuscus,aulostomus_sp)'B(4.9,9.37,1e-300,0.05)',(synchiropus_ocellatus,(dactyloptena_orientalis,(parupeneus_multifasciatus,pseudupeneus_maculatus))))'B(6.971,11.13,1e-300,0.05)',((chiasmodon_niger,kali_normani),((scomber_scombrus,scomberomorus_maculatus)'B(5.417,9.42,1e-300,0.05)',(((taractes_asper,brama_japonica),(ruvettus_pretiosus,assurger_anzac)),(psenopsis_anomala,(pomatomus_saltatrix,(cubiceps_baxteri,psenes_cyanophrys))))))),((((xenentodon_cancila,(pholidichthys_leucotaenia,pterophyllum_leopoldi)'B(4.546,6.74,1e-300,0.05)')'B(4.9,7.97,1e-300,0.05)',((plesiops_sp,(pseudochromis_flavivertex,pseudochromis_sankeyi)),(chromis_enchrysura,(gramma_loreto,(scartella_cristata,(labrisomus_nuchipinnis,emblemariopsis_sp)))))),((centropomus_medius,sphyraena_putnamae)'B(4.9,7.97,1e-300,0.05)',((polydactylus_sexfilis,(citharoides_macrolepis,((hippoglossina_oblonga,bothus_pantherinus),(parachirus_xenicus,poecilopsetta_plinthus)'B(4.12,6.66,1e-300,0.05)'))'B(4.9,7.97,1e-300,0.05)'),((toxotes_jaculatrix,(istiophorus_platypterus,mene_maculatus)'B(5.52,9.46,1e-300,0.05)'),((coryphaena_hippurus,rachycentron_canadum),(seriola_zonata,(alectis_indicus,(caranx_melampygus,(chloroscombrus_orqueta,alepes_kleinii))))'B(4.9,6.93,1e-300,0.05)')'B(5.417,8.08,1e-300,0.05)')))),((cephalopholis_argus,(hypoplectrus_puela,(taenianotus_triacanthus,pseudanthias_squamipinnis))),(micropterus_salmoides,((pempheris_schomburgkii,(cheimarrichthys_fosteri,((thalassoma_ballieui,(halichoeres_poeyi,halichoeres_radiatus)),(epibulus_insidiator,(scarus_trispinosus,scarus_ferrigineus))'B(1.19,6.33,1e-300,0.05)'))),(((pareques_acuminatus,micropogonias_undulatus),(lutjanus_goreensis,(haemulon_album,(haemulon_flavilineatum,haemulon_parra)))),(((pomacanthus_paru,(chaetodon_ocellatus,chaetodon_kleinii))'B(2.962,6.6,1e-300,0.05)',(naso_unicornis,(acanthurus_pyroferus,(acanthurus_bahianus,(acanthurus_bariene,acanthurus_olivaceus))))'B(4.9,6.93,1e-300,0.05)')'B(5.417,8.08,1e-300,0.05)',(antigonia_capros,((antennarius_striatus,(ogcocephalus_radiatus,cryptopsaras_couesii)'B(4.9,6.16,1e-300,0.05)'),(balistes_capriscus,(takifugu_occelatus,(tetraodon_suvatti,(canthigaster_margaritata,(canthigaster_rostrata,canthigaster_figueiredoi))))'B(3.202,5.85,1e-300,0.05)')'B(5.05,6.96,1e-300,0.05)'))))))))))))))'B(9.39,12.8,1e-300,0.05)')'B(9.8,14.3,1e-300,0.05)';
mcmctree median1extmcmctree.ctl
mcmctree median2extmcmctree.ctl
mcmctree median3extmcmctree.ctl
The control files referenced in the commands given above were completely identical to those used in the first round of median-rate partition analyses, except for the guide tree specifying topology and divergence time priors.
When the analyses run under the usedata = 2 option finished after 5:31 (median1ext), 1:25:50 (median2ext), and 17:26 (median3ext), respectively, and the names of the Hessian matrix files as well as the control file usedata values were modified in the way described above, new mcmcTREE analyses were launched. These finished in 39:27:41 (median1ext), 38:59:27 (median2ext), and 39:21:48 (median3ext), with the following results:
median1ext PAML time tree
median2ext PAML time tree
median3ext PAML time tree
Given the discrepancies between the results based on the fastest and slowest partitions, PartitionFinder was re-run on the full original alignment under the rcluster algorithm. Unlike the k-means algorithm, rcluster requires a set of prespecified partitions that may be grouped together in various ways but not split. While such an a priori partitioning scheme can easily be specified for exon data (e.g., by locus, by codon position, or by both), there is no obvious biologically motivated equivalent for ultraconserved elements. Therefore, the alignment was split into blocks of length 100 bases using the following code, which also ensures that the output is in the format required by PartitionFinder configuration files:
options(scipen=999)
# This is important for preventing R from converting the numbers "100000", "200000", and "300000"
# into scientific notation, which is not allowed in PartitionFinder configuration files
# How many rows are needed?
n <- ceiling(302488/100)
partitionname <- vector()
for(i in 1:n) {
partitionname <- append(partitionname, paste("partition", i, sep = ""))
}
# Verify length
length(partitionname) - n
## [1] 0
start <- c(1)
for(i in 1:(n-1)) {
start <- append(start, 100*i)
}
# Verify length:
length(start) - n
## [1] 0
end <- c(99)
for(i in 1:(n-1)) {
end <- append(end, 100*i + 99)
}
# Verify length:
length(end) - n
## [1] 0
# Modify the last data block
end[3025] <- 302488
df <- data.frame(partitionname, start, end)
# Uncomment to print the file
# for(i in 1:n) {
# write(paste(df[i,"partitionname"], " = ", df[i,"start"], " - ", df[i,"end"], ";", sep = ""),
# "/Users/David/Grive/Alfaro_Lab/PartitionFinder/rcluster/data_blocks.txt",
# append=TRUE)
# }
This produced the configuration file below (first 30 lines shown):
alignment = acanthomorph-gblocks-clean-min-90-taxa;
user_tree_topology = ExaBayes_ConsensusExtendedMajorityRuleNewick_Acanthomorph.tre;
branchlengths = linked;
models = GTR+G;
model_selection = BIC;
[data_blocks]
partition1 = 1 - 99;
partition2 = 100 - 199;
partition3 = 200 - 299;
partition4 = 300 - 399;
partition5 = 400 - 499;
partition6 = 500 - 599;
partition7 = 600 - 699;
partition8 = 700 - 799;
partition9 = 800 - 899;
partition10 = 900 - 999;
partition11 = 1000 - 1099;
partition12 = 1100 - 1199;
partition13 = 1200 - 1299;
partition14 = 1300 - 1399;
partition15 = 1400 - 1499;
partition16 = 1500 - 1599;
partition17 = 1600 - 1699;
partition18 = 1700 - 1799;
partition19 = 1800 - 1899;
The configuration file shown above was run with the following command line options:
python PartitionFinder.py /home/analysis/partitionfinder-new/acanth-75p-rcluster/partition_finder.cfg --verbose --save-phylofiles --raxml
Note that the --rcluster-max and --rcluster-percent were left unspecified, effectively keeping them at their default values of 1000 and 10, respectively.
The list of final subsets given in the best-scheme file was manually extracted and saved into a new file, from which three subsets were selected at random by their IDs. The alignments selected were discarded if they did not reach a minimum length of 2000 sites, and the process was repeated until three sufficiently long alignments have been collected.
rcluster <- read.delim("/Users/David/Grive/Alfaro_Lab/PartitionFinder/rcluster/subsetid.txt", sep = "|", strip.white = T, header=TRUE)
sample(rcluster[, "subset.id"], 3, replace=FALSE)
## [1] 412a9fd21526cc219030c6131988d653 80ff35254c2adb8167d9708ed4d92b4c
## [3] 90f81d9fbb54052747e711e6fe27b5cb
## 95 Levels: 01ae33a19b7d0b8db59c43886d68d1bf ...
| Subset ID | Sites | Name of chain |
|---|---|---|
| 08d545f1bb470785ba7a046efdafcfe7 | 4900 | rcluster1 |
| dbfddb57bfeaeae901d79d97235e165d | 3700 | rcluster2 |
| 15466a299bd6e3c6e559a61d97fe0562 | 2100 | rcluster3 |
As in previous cases, PhyloBayes was first run to estimate the birth-death hyperparameters, which would be fixed in the later calibrated runs:
../pb -d 08d545f1bb470785ba7a046efdafcfe7.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 rcluster1nocal
../pb -d dbfddb57bfeaeae901d79d97235e165d.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 rcluster2nocal
../pb -d 15466a299bd6e3c6e559a61d97fe0562.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 rcluster3nocal
The chains were paused on Sunday 02/19/2017, 12:22 PM (rcluster1nocal, at 18623 generations), 12:23 PM (rcluster2nocal, at 22580 generations), and 12:24 PM (rcluster3nocal, at 31709 generations). The ESS values obtained are given below:
| parameter | rcluster1 | rcluster2 | rcluster3 |
|---|---|---|---|
| states | 18623 | 22580 | 31709 |
| loglik | 8994 | 9717 | 17692 |
| length | 2445 | 1487 | 4290 |
| sigma | 7608 | 7490 | 19974 |
| mu | 1010 | 1880 | 2416 |
| meanrates | 2750 | 3148 | 11542 |
| p1 | 7343 | 3393 | 19669 |
| p2 | 2427 | 1478 | 9880 |
| alpha | 9135 | 10517 | 17294 |
| nmode | - | - | - |
| stat | 2956 | 2714 | 4733 |
| statalpha | - | - | - |
| rrent | 7305 | 7120 | 10937 |
| meanrr | 1279 | 1564 | 2283 |
| kappa | - | - | - |
| allocent | - | - | - |
Based on the longest autocorrelation times detected in the three analyses (rcluster1: mu, 16.59; rcluster2: p2, 13.75; rcluster3: meanrr, 12.50), the readdiv program was run as follows in order to estimate the mean posterior values of p1 and p2 as well as their standard deviations:
../readdiv -x 1862 50 rcluster1nocal
../readdiv -x 2258 50 rcluster2nocal
../readdiv -x 3171 50 rcluster3nocal
Resulting in the following estimates of the parameters of interest:
| chain | p1 | p2 | ESS |
|---|---|---|---|
| rcluster1 | 0.00118489 +/- 0.000991948 | 0.00045689 +/- 0.00025091 | 335 |
| rcluster2 | 0.00166149 +/- 0.00130985 | 0.000593605 +/- 0.000266553 | 406 |
| rcluster3 | 0.00151578 +/- 0.00120178 | 0.000427276 +/- 0.000240445 | 570 |
These values served as hyperparameters determining the shape of the birth-death prior used in the calibrated analyses:
../pb -d 08d545f1bb470785ba7a046efdafcfe7.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00118489 0.00045689 -cal calib_root_corrected -sb -gtr -dgam 4 rcluster1cal
../pb -d dbfddb57bfeaeae901d79d97235e165d.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00166149 0.000593605 -cal calib_root_corrected -sb -gtr -dgam 4 rcluster2cal
../pb -d 15466a299bd6e3c6e559a61d97fe0562.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00151578 0.000427276 -cal calib_root_corrected -sb -gtr -dgam 4 rcluster3cal
The calibrated analyses were soft-stopped on Tuesday 02/21/2017 at 11:52 PM (rcluster1cal; 10737 generations), 11:53 PM (rcluster2cal; 12740 generations), and 11:54 PM (rcluster3cal; 16351 generations). The following effective sample sizes were reported by Tracer:
| parameter | rcluster1 | rcluster2 | rcluster3 |
|---|---|---|---|
| states | 10737 | 12740 | 16351 |
| loglik | 4594 | 5358 | 8938 |
| length | 1423 | 865 | 2116 |
| sigma | 5989 | 4081 | 9860 |
| mu | 616 | 935 | 1348 |
| meanrates | 980 | 1581 | 4620 |
| p1 | - | - | - |
| p2 | - | - | - |
| scale | 657 | 598 | 3390 |
| alpha | 5968 | 5878 | 8867 |
| nmode | - | - | - |
| stat | 1345 | 1397 | 1947 |
| statalpha | - | - | - |
| rrent | 4098 | 3299 | 5291 |
| meanrr | 678 | 802 | 1240 |
| kappa | - | - | - |
| allocent | - | - | - |
These values corresponded to the maximum autocorrelation times of 15.69 (rcluster1: mu), 19.18 (rcluster2: scale), and 11.86 (rcluster3: meanrr). Based on those, readdiv analyses were launched using the following commands:
../readdiv -x 1074 30 rcluster1cal
../readdiv -x 1274 30 rcluster2cal
../readdiv -x 1635 30 rcluster3cal
The resulting time trees are shown below:
rcluster1 PhyloBayes time tree
rcluster2 PhyloBayes time tree
rcluster3 PhyloBayes time tree
The underflows (in bold) and overflows (in italics) of the posterior mean divergence dates with respect to the calibration bounds are summarized in the table below:
| Calibration | Lower bound | Upper bound | Node number | rcluster1 | rcluster2 | rcluster3 |
|---|---|---|---|---|---|---|
| Aipichthys | 98.0 | 128.82 | 119 | 125.479 | 121.048 | 124.336 |
| Berybolcensis | 49.0 | 109.29 | 132 | 86.7686 | 85.2699 | 82.6209 |
| Calatomus | 11.9 | 43.95 | 210 | 35.104 | 31.0816 | 34.3208 |
| Chaetodontidae | 29.62 | 59.26 | 220 | 59.3976 | 59.7278 | 55.4057 |
| Eobuglossus | 41.2 | 53.88 | 189 | 45.7935 | 44.9722 | 43.1575 |
| Eucoelopoma | 54.17 | 95.58 | 162 | 68.4367 | 63.5752 | 73.5542 |
| Homonotichthys | 93.9 | 116.35 | 123 | 115.157 | 115.413 | 114.048 |
| Mcconichthys | 63.1 | 93.51 | 124 | 48.2893 | 58.3972 | 63.1317 |
| Ramphexocoetus | 49.0 | 80.52 | 173 | 78.1323 | 74.7231 | 76.9205 |
| Tarkus | 49.0 | 53.93 | 229 | 50.2866 | 50.3269 | 51.2596 |
| Mene | 55.2 | 95.64 | 192 | 78.3756 | 67.9794 | 67.6891 |
| Eastmanelepes | 49.0 | 61.61 | 195 | 55.8886 | 54.9043 | 56.9354 |
More complete information on the overflows and underflows is available in the _sample.calib files.
The analyses were re-run under a new calibration file that contained the full 29-calibration point dataset as well as the corrected maximum placed on the (Polymixia + Aphredoderus) calibration:
../pb -d 08d545f1bb470785ba7a046efdafcfe7.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00118489 0.00045689 -cal extendcorrectcalib -sb -gtr -dgam 4 rcluster1ext
../pb -d dbfddb57bfeaeae901d79d97235e165d.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00166149 0.000593605 -cal extendcorrectcalib -sb -gtr -dgam 4 rcluster2ext
../pb -d 15466a299bd6e3c6e559a61d97fe0562.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00151578 0.000427276 -cal extendcorrectcalib -sb -gtr -dgam 4 rcluster3ext
These chains were soft-stopped on Thursday 02/23/2017 at 6:08 PM (rcluster1ext: 7025 generations), 6:21 PM (rcluster2ext: 8130 generations), and 6:22 PM (rcluster3ext: 10460 generations), after their parameters reached the following effective sample sizes:
| parameter | rcluster1ext | rcluster2ext | rcluster3ext |
|---|---|---|---|
| states | 7025 | 8130 | 10460 |
| loglik | 2966 | 2972 | 6198 |
| length | 849 | 525 | 1225 |
| sigma | 4072 | 2749 | 6702 |
| mu | 526 | 578 | 1142 |
| meanrates | 468 | 1393 | 3837 |
| p1 | - | - | - |
| p2 | - | - | - |
| scale | 499 | 692 | 1815 |
| alpha | 3799 | 3523 | 5257 |
| nmode | - | - | - |
| stat | 1101 | 1223 | 1405 |
| statalpha | - | - | - |
| rrent | 2981 | 2309 | 4156 |
| meanrr | 592 | 413 | 939 |
| kappa | - | - | - |
| allocent | - | - | - |
The readdiv program was started using the commands shown below, which were based on the greatest autocorrelation times observed in Tracer (rcluster1ext: meanrates, 13.50; rcluster2ext: meanrr, 17.70; rcluster3ext: meanrr, 10.02):
../readdiv -x 703 30 rcluster1ext
../readdiv -x 813 30 rcluster2ext
../readdiv -x 1046 30 rcluster3ext
Yielding the following time trees:
rcluster1ext PhyloBayes time tree
rcluster2ext PhyloBayes time tree
rcluster3ext PhyloBayes time tree
In the table below, the lower and upper bounds placed on the calibrated nodes (including the root) are compared with the corresponding posterior mean ages, with overflows in italics and underflows in bold:
| Calibration | Lower bound | Upper bound | Node # | rcluster1ext | rcluster2ext | rcluster3ext |
|---|---|---|---|---|---|---|
| Root | 98.0 | 143.0 | 118 | 127.786 | 129.419 | 127.541 |
| Hoplopteryx | 93.9 | 128.0 | 129 | 122.357 | 124.034 | 123.49 |
| Calatomus | 11.9 | 63.3 | 210 | 38.2518 | 34.1469 | 43.5446 |
| Triodon | 50.5 | 69.6 | 230 | 58.5936 | 59.6328 | 66.5676 |
| Archaeotetraodon | 32.02 | 58.5 | 231 | 31.8787 | 32.4746 | 32.1466 |
| Tarkus | 49.0 | 61.6 | 229 | 50.1895 | 50.6793 | 53.2813 |
| Luvarus | 54.17 | 80.8 | 219 | 76.5659 | 71.8821 | 77.8735 |
| Proacanthurus | 49.0 | 69.3 | 222 | 51.7481 | 51.2993 | 53.0406 |
| Chaetodontidae | 29.62 | 66.0 | 220 | 74.0795 | 67.686 | 62.0793 |
| Archaeus | 54.17 | 80.8 | 193 | 57.4952 | 59.8435 | 66.1987 |
| Eastmanalepes | 49.0 | 69.3 | 195 | 54.8664 | 56.9666 | 62.56 |
| Mene | 55.2 | 94.6 | 192 | 73.8609 | 68.2343 | 68.6717 |
| Eobothus | 49.0 | 79.7 | 186 | 63.1293 | 62.7777 | 58.3339 |
| Eobuglossus | 41.2 | 66.6 | 189 | 44.2319 | 46.5881 | 42.7674 |
| Sphyraena | 49.0 | 79.7 | 183 | 78.4609 | 72.0311 | 77.8716 |
| Rhamphexocoetus | 49.0 | 79.7 | 173 | 75.7623 | 74.3786 | 75.5472 |
| Mahengechromis | 45.46 | 67.4 | 174 | 61.8942 | 64.2091 | 60.5732 |
| Gasterorhamphosus | 69.71 | 111.3 | 154 | 79.93 | 77.2257 | 84.5181 |
| Prosolenostomus | 49.0 | 93.7 | 155 | 73.8318 | 72.155 | 77.2461 |
| Eocoelopoma | 54.17 | 94.2 | 162 | 65.0129 | 64.8334 | 74.7656 |
| Berybolcensis | 49.0 | 93.7 | 132 | 81.1629 | 83.2881 | 76.0081 |
| Aipichthys | 98.0 | 143.0 | 119 | 123.691 | 120.623 | 121.403 |
| Cretzeus | 69.71 | 111.3 | 125 | 107.124 | 105.645 | 106.574 |
| Rhinocephalus | 53.7 | 94.4 | 128 | 59.9218 | 63.9852 | 62.5429 |
| Zenopsis | 32.02 | 91.4 | 126 | 35.779 | 34.08 | 36.6441 |
| Homonotichthys | 93.9 | 116.35 | 123 | 114.559 | 115.027 | 112.495 |
| Massamorichthys | 58.551 | 109.6 | 124 | 41.1684 | 51.1745 | 57.5285 |
| Turkmene | 54.17 | 122.5 | 120 | 55.727 | 55.6773 | 65.1835 |
| Trachipterus | 5.33 | 94.0 | 121 | 21.0129 | 16.9183 | 33.7002 |
More detailed information about the underflows and overflows is available in the _sample.calib files.
In the first step of the analysis, baseml was run on a tree without branch length information whose root was calibrated at 113.02 Ma (in the units of hundreds of millions of years, i.e., denoted as “1.1302”). The settings used in the baseml control file were identical to those described above.
The preliminary analyses finished in 3:08:57 (rcluster1), 1:21:47 (rcluster2), and 52:53 (rcluster3), yielding the following rates of substitution per 100 million years:
| partition | rcluster1 | rcluster2 | rcluster3 |
|---|---|---|---|
| rate | 0.151141 | 0.173575 | 0.099392 |
These were used to determine the values of the rgene_gamma prior in the following way (assuming an \(\alpha\) value of 2):
## [1] "beta_rcluster1 = 132.326767720208"
## [1] "beta_rcluster2 = 115.22396658505"
## [1] "beta_rcluster3 = 201.223438506117"
Leading to the \(\beta\) values shown below:
| parameter | median1 | median2 | median3 |
|---|---|---|---|
| \(\alpha\) (shape) | 2 | 2 | 2 |
| \(\beta\) (rate) | 132.33 | 115.22 | 201.22 |
The prior placed on the variance of the logarithm of the rate was retained from the previous PAML analyses (i.e., a gamma distribution with \(\alpha = 1\) and \(\beta = 10\)). The prior distributions assigned to the mean substitution rates of the three partitions are shown below. The means of the rgene_gamma and sigma2_gamma hyperpriors were then used as the mean and variation of the overall rate distributions shown further below:
Aside from the values of rgene_gamma and sigma2_gamma, the mcmcTREE control files created for the rcluster partitions were identical to those written for the “median” runs, including the use of the corrected calibrated tree. To estimate the Hessian matrix for branch lengths, mcmcTREE was initially run under the usedata = 3 option and finished up in 3:35 (rcluster1), 3:02 (rcluster2), and 1:15 (rcluster3). The Hessian matrices was stored in out.BV files, which were subsequently renamed as in.BV, with the usedata value changed accordingly to 2.
The mcmcTREE chains reached the length and sample sizes specified in their respective control files after 35:02:29 (rcluster1), 35:22:06 (rcluster2), and 34:42:23 (rcluster3), with the following results:
rcluster1 PAML time tree
rcluster2 PAML time tree
rcluster3 PAML time tree
The analyses were then repeated under the extended set of calibrations that also incorporated the corrected value of the upper bound assigned to the (Polymixia + Aphredoderus) divergence. The initial analyses (run under the usedata = 3 option) finished up in 2:44 (rcluster1ext), 2:11 (rcluster2ext), and 1:03 (rcluster3ext). As in the previous analyses, the usedata value in the respective control files was changed to 2 and the out.BV files were renamed as in.BV to serve as an input for the next round of analyses, which was completed after 35:24:28 (rcluster1ext), 35:46:03 (rcluster2ext), and 35:01:07 (rcluster3ext). The resulting time trees are shown below:
rcluster1ext PAML time tree
rcluster2ext PAML time tree
rcluster3ext PAML time tree
The rcluster partitions were analyzed in BEAST under a set of 13 calibration points (including one placed on the root, and with the corrected upper bound value of 116.35 Ma assigned to the (Polymixia + Aphredoderus) node), the random local clock model, and the fixed topology operator mix with default values. The length of all three chains was set to 500 million generations, sampling each 1000th generation:
beast -threads 12 -beagle_SSE rcluster1rlc.xml
beast -threads 12 -beagle_SSE rcluster2rlc.xml
beast -threads 12 -beagle_SSE rcluster3rlc.xml
(Note: Not finished because of the leontocebus power outage; not restarted – root age values did not seem promising.)
The initial PhyloBayes run was conducted without calibrations and served to estimate the birth-death model hyperparameters:
../pb -d 8be7c94dcf2d71970c663f6710af40d7.phy -T 12_no_outgroups_scheme_3.tre -ln -bd -gtr -dgam 4 bestnocal
The analysis was soft-stopped on Monday 04/24/2017 at 10:44 PM at 6137 generations, with the following ESS values:
| parameter | ESS |
|---|---|
| states | 6137 |
| loglik | 3195 |
| length | 2230 |
| sigma | 1835 |
| mu | 340 |
| meanrates | 656 |
| p1 | 523 |
| p2 | 210 |
| alpha | 1767 |
| nmode | - |
| stat | 2138 |
| statalpha | - |
| rrent | 2576 |
| meanrr | 359 |
| kappa | - |
| allocent | - |
Based on these and on the maximum autocorrelation time shown by Tracer (p2; 26.34), the chain was summarized as follows:
../readdiv -x 614 27 bestnocal
This command provided the following summary of the posterior distributions for p1 and p2:
| p1 | p2 |
|---|---|
| 0.00127672 +/- 0.00105001 | 0.000816072 +/- 0.00030442 |
The calibrated analysis was thus run under the following settings:
../pb -d 8be7c94dcf2d71970c663f6710af40d7.phy -T 12_no_outgroups_scheme_3.tre -ln -bd 0.00127672 0.000816072 -cal calib_root_corrected -sb -gtr -dgam 4 bestcal
The chain was paused on Thursday 04/27/2017 at 9:41 AM at 11946 generations. The effective sample sizes of individual parameters were as follows:
| parameter | ESS |
|---|---|
| states | 11946 |
| loglik | 5175 |
| length | 4494 |
| sigma | 4181 |
| mu | 748 |
| meanrates | 915 |
| p1 | - |
| p2 | - |
| scale | 1351 |
| alpha | 3754 |
| nmode | - |
| stat | 3739 |
| statalpha | - |
| rrent | 5510 |
| meanrr | 771 |
| kappa | - |
| allocent | - |
The chain was then summarized under the following settings, which were again based on the maximum autocorrelation time observed in Tracer (mu; 14.38):
../readdiv -x 1195 25 bestcal
Best partition PhyloBayes time tree
A comparison of the mean estimated node ages with the lower and upper bounds placed on their respective priors is presented in the table below, with underflows in bold:
| Calibration | Lower bound | Upper bound | Node number | Mean |
|---|---|---|---|---|
| Aipichthys | 98.0 | 128.82 | 119 | 123.787 |
| Berybolcensis | 49.0 | 109.29 | 132 | 83.582 |
| Calatomus | 11.9 | 43.95 | 210 | 34.2084 |
| Chaetodontidae | 29.62 | 59.26 | 220 | 58.0471 |
| Eastmanelepes | 49.0 | 61.61 | 195 | 53.8745 |
| Eobuglossus | 41.2 | 53.88 | 189 | 46.8092 |
| Eucoelopoma | 54.17 | 95.58 | 162 | 60.108 |
| Homonotichthys | 93.9 | 116.35 | 123 | 113.319 |
| Mcconichthys | 63.1 | 93.51 | 124 | 46.0858 |
| Mene | 55.2 | 95.64 | 192 | 72.7587 |
| Ramphexocoetus | 49.0 | 80.52 | 173 | 77.9238 |
| Root | 98.0 | 143.0 | 118 | 141.056 |
| Tarkus | 49.0 | 53.93 | 229 | 51.1028 |
More information on overflows and underflows is given in the respective _sample.calib file.
In order to make the analysis more directly comparable to the results obtained using BEAST (see below), a root calibration of 120.5 Ma (rather than 113.02 Ma) was used in baseml to estimate the mean susbtitution rate. The time unit was set to one hundred million years (i.e., the calibration was denoted as “1.205”). All other settings were identical to those employed in the baseml analyses reported above.
Baseml estimated the rate of evolution of the partition to be 0.237600 substitutions per one time unit (i.e., per 100 million years). The rate parameter (\(\beta\)) of rgene_gamma was set to a value such that \(\frac{2}{\beta}\) equals the rate above expressed in terms of tens of millions of years:
## [1] "beta_best = 84.1750841750842"
Resulting in the following rgene_gamma distribution:
The hyperprior assigned to sigma2_gamma was identical to that used in the previous analyses, with \(\sigma^2 \sim \Gamma\left(\alpha = 1, \beta = 10\right)\). The overall rate distribution can be visualized as follows (with the mean and standard deviation set to the expected values of the respective hyperpriors):
Next, a Hessian matrix was generated and stored in out.BV using the following command:
mcmctree bestmcmctree.ctl
After the appropriate modifications of the control and output files, a second mcmcTREE analysis was started using the same command.
| Chain | Hessian matrix estimation | Time tree analysis |
|---|---|---|
| best | 2:27 | 50:40:01 |
| best2 | 4:52 | |
| best3 | 4:52 | |
| best4 | 4:43 | |
| best5 | 5:22 | |
| best6 | 4:46 | |
| best7 | 4:39 | |
| best8 | 5:00 | |
| best9 | 5:19 | |
| best10 | 5:20 |
PAML time tree of the best partition, run 1
The root was assigned an exponential prior whose mean was equal to the mean of the uniform prior used in PhyloBayes and in mcmcTREE, and which was truncated at 143 Ma to prevent BEAST from inferring excessively old dates for the root:
root <- (143 - 98)/2
paste("Root", root, sep = " = ")
## [1] "Root = 22.5"
The prior placed on the (Polymixia + Aphredoderus) node (calibrated by Homonotichthys) was constructed using the corrected maximum of 116.35 Ma. The analysis was run under the lognormal relaxed clock model and the fixed-topology operator mix, in which the operators acting on the ucld.mean and ucld.stdev parameters received tuning values of 0.9 and weights of 6.0. MCMC was run for 200 million generations, with a sampling frequency of 1 sample per 1000 generations.
beast -threads 12 -beagle_SSE bestpartition-run1.xml